N ONINFORM ATIVE  PRIORS  WITH  APPLICATIONS 


By 

MING  YIN 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 


1997 


©  Copyright  1997 
by 

Ming  Yin 


ACKNOWLEDGEMENTS 


I  would  like  to  express  my  sincere  gratitude  to  my  advisor  Dr.  Malay  Ghosh  for 
his  enthusiastic  support  and  professional  guidance.  It  was  an  enjoyable  experience 
working  with  him  and  his  advice  was  very  valuable  in  accomplishing  my  research 
goals. 

I  am  very  grateful  to  Dr.  Myron  Chang  and  Dr.  Ronald  Randies  for  their  generous 
help.  I  would  also  like  to  thank  Dr.  Pejaver  V.  Rao,  Dr.  Geoff  Vining  and  Dr.  Michael 
Delorenzo  for  serving  on  my  supervisory  committee,  thank  Dr.  Murali  Rao  for  sitting 
in  my  defense. 

I  am  indebted  to  my  parents,  sisters  and  brother-in-law  for  their  continued  love, 
encouragement  and  support.  And,  as  always,  thanks  to  my  wife,  Ying  Lang,  for 
helping  me  in  more  ways  than  I  could  ever  say  or  even  remember. 


iii 


TABLE  OF  CONTENTS 


ACKNOWLEDGEMENTS    iii 

LIST  OF  TABLES    vi 

LIST  OF  FIGURES    vii 

ABSTRACT    viii 

CHAPTERS 

1  INTRODUCTION    1 

1.1  Literature  Review    1 

1.2  The  Subject  of  This  Dissertation    4 

2  THE  GENERALIZED  FIELLER-CREASY  PROBLEM    7 

2.1  Introduction    7 

2.2  Non-informative  Priors    8 

2.3  Posterior  Analysis    13 

2.4  Simulation  Study    22 

2.5  Discussion  and  Conclusions    24 

3  THE  SLOPE-RATIO  PROBLEMS    29 

3.1  Introduction    29 

3.2  Slope-ratio  Problem    31 

3.3  Posterior  Analysis    37 

3.4  Numerical  Examples    46 

3.5  Multiple  Linear  Regression    52 

4  THE  MULTIVARIATE  LINEAR  CALIBRATION  PROBLEM    64 

4.1  Introduction    64 

4.2  Multivariate  Linear  Calibration  -  (  is  a  scalar    66 

4.3  Multivariate  Linear  Calibration  -  £  is  a  vector    74 

4.4  Appendix:    80 

iv 


5     ASYMPTOTIC  EXPANSIONS  FOR  POSTERIOR  PROBABILITY 


IN  REGRESSION  MODEL    85 

5.1  Introduction    85 

5.2  Model,  Notations  and  Assumptions    86 

5.3  Strong  Consistency    89 

5.4  Asymptotic  Expansions  of  Posterior  Probability    93 

5.5  Further  Discussions   101 

6    SUMMARY  AND  FUTURE  RESEARCH   102 

6.1  Summary   102 

6.2  Future  Research   103 

REFERENCES   104 

BIOGRAPHICAL  SKETCH   109 


v 


LIST  OF  TABLES 


Table  Page 

2.1  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  6 
when  n  —  5,  v  —  oo (normal)   26 

2.2  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  6 
when  n  =  10,  v  —  oo(normal)   26 

2.3  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  6 
when  n  =  5,  v  =  10   27 

2.4  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  8 
when  n  =  10,  v  =  10   27 

2.5  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  0 
when  n  =  5,  u  =  1   28 

2.6  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  9 
when  n  =  10,  v  =  1   28 

3.1  Design  1    47 

3.2  Design  2   47 

3.3  Responses  in  an  assay  of  riboflavin  in  malt    48 

3.4  Posterior  quantiles  and  probabilities  based  on  Finney's  data   50 

3.5  A  simulated  data  for  slope-ratio  problem    52 

4.1    Catalog  of  reference  priors   79 


vi 


LIST  OF  FIGURES 


Figure  Page 

2.1  a.)  Posteriors  Based  On  Noninformative  Priors    20 

2.2  b.)  Posteriors  Based  On  Noninformative  Priors    22 

3.1  Posterior  Comparsion  Between  Two  Designs   46 

3.2  Posteriors  Based  On  Finney's  Data   49 

3.3  Posteriors  Based  On  Simulated  Data    51 


vii 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment 
of  the  Requirements  for  the  Degree  of 
Doctor  of  Philosophy 

NONINFORMATrVE  PRIORS  WITH  APPLICATIONS 

By 
Ming  Yin 
December  1997 

Chairman:  Malay  Ghosh 
Major  department:  Statistics 

In  this  dissertation,  various  noninformative  priors,  in  particular,  probability  match- 
ing priors  are  derived  in  multivariate  linear  calibration  problems,  and  for  a  general 
problem  of  estimating  a  ratio  of  two  linear  combinations  of  coefficients  in  the  multiple 
linear  regression  model.  The  latter  includes  the  famous  Fieller-Creasy  problem  (ratio 
of  two  normal  means),  slope-ratio  problem  (in  bioassay),  univariate  linear  calibration 
problem  and  many  other  problems  which  are  usually  quite  challenging  from  either  a 
frequentist  or  a  likelihood  based  approach.  Also,  the  validity  of  probability  matching 
priors  in  the  linear  regression  models  is  verified.  In  this  study,  instead  of  the  usual 
normality  assumption  for  error  distributions  ,  only  smooth  and  symmetric  conditions 
are  assumed. 

First,  a  sufficient  condition  is  given  in  multiple  linear  regression  such  that  a  com- 
mon probability  matching  prior  exists.  This  prior  does  not  depend  on  the  choice  of 
the  error  distribution  (among  the  class  of  smooth  and  symmetric  distributions)  or  the 
design  matrix.  This  sufficient  condition  holds  for  many  situations. 


viii 


For  generalized  Fieller-Creasy  problems,  we  study  the  properties  of  the  posterior 
distributions  for  a  subclass  of  error  distributions,  including  normal,  t  and  double  ex- 
ponential. The  Bayesian  procedure  is  implemented  via  Markov  Chain  Monte  Carlo 
(MCMC).  Our  simulation  study  indicates  that  the  second  order  probability  match- 
ing prior  performs  very  well  (better  than  reference  priors,  Bernardo  1979,  Berger  & 
Bernardo  1989,  1992ab)  in  terms  of  matching  the  target  coverage  probabilities  in  a 
frequentist  sense.  Also  this  performance  is  robust  across  different  error  distributions. 

For  slope  ratio  problems,  we  find  an  orthogonal  transformation  and  derive  a  class 
of  first  order  pi  ob&bility  matching  priors.  A  second  order  pr  bability  matching  prior 
is  also  derived  which  does  not  depend  on  the  choice  of  the  covariates  and  the  error  dis- 
tribution. Properties  of  the  posterior  distributions  are  discussed  for  the  normal,  t  and 
double  exponential.  More  detailed  analyses  are  given  in  the  normal  case.  The  design 
issues  are  addressed  such  that  the  posterior  will  provide  more  correct  information  of 
the  true  parameter.  Several  numerical  examples  are  studied. 

For  multivariate  linear  calibration  problems,  a  class  of  first  order  probability 
matching  priors  and  a  complete  catalog  of  reference  priors  are  derived  when  the 
explanatory  variable  is  a  vector,  while  second  order  probability  matching  priors  are 
derived  when  the  explanatory  variable  is  a  scalar.  Posteriors  based  on  the  proba- 
bility matching  priors  and  the  reference  priors  are  derived,  and  their  properties  are 
discussed. 

Finally,  in  order  to  give  justification  for  probability  matching  priors  in  non  i.i.d 
cases,  asymptotic  expansions  of  the  posterior  probabilities  in  linear  regression  models 
are  derived.  This  work  extends  the  results  of  Johnson  (1970),  which  are  based  on  the 
one  parameter  family  of  distributions  in  the  independent  and  identically  distributed 
(i.i.d)  case.  Strong  consistency  of  maximum  likelihood  estimators  of  parameters  are 
proved. 


ix 


CHAPTER  1 
INTRODUCTION 

1.1    Literature  Review 

Bayesian  methods  have  become  increasingly  popular  in  the  theory  and  practice  of 
statistics.  This  is  partly  due  to  the  fact  that  even  with  little  or  no  prior  information, 
one  can  often  employ  noninformative  priors  to  draw  reliable  inference.  Thus,  not 
surprisingly,  over  the  years,  a  wide  range  of  noninformative  priors  has  been  proposed 
and  studied. 

The  earliest  use  of  noninformative  priors  is  attributed  to  Laplace(1812)  who  rec- 
ommended usinp  a  flat  prior  over  the  entire  parameter  space.  Such  a  prior,  though 
frequently  used,  is  subject  to  the  obvious  criticism  that  it  does  not  remain  invariant 
under  one-to-one  reparameterization.  This  is  especially  bothersome  since  there  is  no 
unique  parameterization  for  most  statistical  models. 

Thus,  Jeffreys  (1961)  proposed  a  prior  which  remains  invariant  under  any  one-to- 
one  reparameterization.  This  prior  is  derived  simply  as  the  positive  square  root  of 
the  determinant  of  the  Fisher  information  matrix.  Despite  this  invariance,  Jeffreys' 
prior  has  often  been  criticized  in  the  presence  of  nuisance  parameters.  For  example, 
Bernardo  (1979)  has  shown  that  Jeffreys'  prior  can  lead  to  the  marginalization  para- 
dox (cf.  Dawid,  Stone  and  Zidek,  1973)  for  inference  about  ///cr  when  the  model  is 
N(fi,  a2).  A  second  example  due  to  Berger  and  Bernardo  (1992b)  shows  that  Jeffreys' 
prior  can  lead  t  iconsistent  estimator  of  the  error  variance  \i  the  balanced  one-way 
normal  ANOVA  model  when  the  number  of  cells  grows  to  infinity  in  direct  proportion 
to  the  sample  size.  So  Jeffreys'  prior  fails  to  avoid  the  Neyman-Scott  phenomenon 


1 


2 


(Neyman  and  Scott,  1948),  in  such  a  case  the  prior  leads  to  an  inconsistent  estima- 
tor of  the  error  variance  in  the  balanced  one-way  normal  ANOVA  model  when  the 
number  of  unknown  treatment  parameters  goes  to  infinity. 

In  recognition  of  these  problems,  Bernardo  (1979)  proposed  a  class  of  noninforma- 
tive  priors  which  have  become  known  as  "reference  priors."  The  key  feature  for  this 
development  is  splitting  the  parameters  as  parameters  of  interest  and  nuisance  param- 
eters. This  approach  was  further  extended  in  Berger  and  Bernardo  (1989,  1992ab). 

A  somewhat  different  criterion  for  developing  noninformative  priors  is  based  on 
matching  the  coverage  probabilities  of  Bayesian  credible  sets  with  the  corresponding 
frequentist  confidence  sets  up  to  a  certain  order.  As  noted  in  Tibshirani(1989),  these 
priors  provide  a  method  for  constructing  accurate  frequentist  confidence  regions,  and 
such  studies  are  also  helpful  in  defining  noninformative  priors  which  could  be  po- 
tentially useful  for  comparative  purposes  in  a  Bayesian  analysis.  Such  matching  is 
accomplished  either  (a)  through  posterior  quantiles,  (b)  through  the  Highest  Posterior 
Density  (HPD)  regions,  or  (c)  through  the  inversion  of  likelihood  ratio  and  related 
statistics.  In  the  literature  (a)  ,(b)  and  (c)  have  received  attention  in  decreasing  order 
of  importance.  We  only  discuss  case  (a)  in  this  dissertation. 

To  be  more  specific  about  the  definition  of  the  probability  matching  priors  based 
on  posterior  quantiles,  let  {Xn,n  >  1}  be  a  sequence  of  i.i.d.  real-or-vector-valued 
random  variables  with  common  pdf  f(x,9),  where  9  —  (0\, . . .  ,9p)T  belongs  to  some 
open  subset  of  Rp,  and  9i  is  the  parameter  of  interest.  Denote  by  pn(-\X)  the  pos- 
terior probability  measure  for  9X  under  a  prior  n(6),  where  X  =  (X\,...  ,Xn)T. 

Let  9\l~a\ir, x)  denote  the  100  (1  —  a)%  posterior  quantile  of  6\.  A  prior  n(9)  is 
called  wth  order  probability  matching  prior  based  on  posterior  quantiles  if  it  satisfies 
P7r(9l  <  9[1~a\<K,x))\9)  =  1  -  a  +  o(n"2).  Note  that  u  =  1  corresponds  to  the  first 


3 


order  probability  matching  prior  and  u  =  2  corresponds  to  tb*1  second  order  probabil- 
ity matching  prior  (cf.  Ghosh  and  Mukerjee  1996).  Usually,  a  probability  matching 
prior  with  order  higher  than  2  does  not  exist,  except  in  some  irregular  cases. 

Fisher  (1934)  derived  an  exact  match  between  the  conditional  distribution  of  the 
MLE  estimator  of  6  given  an  ancillary  statistic  and  the  Bayesian  posterior  distribution 
by  using  that  prior  for  the  case  of  a  location  parameter.  A  brief  introduction  to 
matching  priors  is  given  in  Lindley  (1958),  but  its  systematic  study  began  essentially 
with  Welch  and  Peers  (1963)  who  considered  the  case  where  there  are  no  nuisance 
parameters.  The  authors  showed  in  this  case  that  the  unique  prior  satisfying  the 
first  order  probability  matching  property  is  Jeffreys'  prior.  Peers  (1965)  extended  the 
case  of  a  one-dimensional  parameter  to  the  case  of  multiple  parameters.  This  first 
order  probability  matching  prior  work  was  pursued  further  in  Stein  (1985),  Tibshirani 
(1989),  Datta  and  Ghosh  (1995a,1995b),  among  others. 

However,  for  many  problems,  there  are  infinitely  many  first  order  probability 
matching  priors.  Tibshirani  (1989)  provides  a  complete  characterization  of  such  pri- 
ors, when  the  real  valued  parameter  of  interest,  say  #i,  is  orthogonal  to  the  nuisance 
parameters  (cf.  Cox  and  Reid,  1987),  that  is  hj(0)  =  0  for  j  =  2, 3, . . .  ,p,  where 
1(0)  =  ((Iij(8))),i,j  =  1,2,...  ,p,  is  the  Fisher  information  matrix.  Mukerjee  and 
Dey  (1993)  narrowed  the  selection  of  priors  from  within  the  class  of  first  order  proba- 
bility matching  priors  by  requiring  coverage  matching  up  to  o(n-1)  instead  of  o(n~2). 
These  results  were  later  generalized  by  Mukerjee  and  Ghosh  (1996)  in  the  presence 
of  several  nuisance  parameters.  These  findings  indicate  that  second  order  probabil- 
ity matching  priors  need  not  always  exist,  while  sometimes  all  first  order  probability 
matching  priors  also  meet  the  second  order  matching  criterion.  Barring  these  extreme 
situations,  second  order  matching  helps  narrow  down  the  selection  of  priors  and  in- 
deed often  helps  finding  a  unique  prior  within  the  class  of  first  order  probability 
matching  priors. 


4 


Probability  matching  priors  are  invariant  under  parametric  transformation  (cf. 
Datta  and  Ghosh,  1996;  Mukerjee  and  Ghosh,  1996),  while  reference  priors  are  in- 
variant only  under  parametric  transformation  within  each  parameter  group.  Datta 
and  Ghosh  (1995)  gave  a  sufficient  condition  such  that  a  reference  prior  is  also  a 
reverse  reference  prior  (Ghosh  and  Mukerjee,  1992),  thus  is  invariant  with  parametric 
transformation. 

All  studies  of  the  probability  matching  priors  described  above  are  based  on  the  as- 
sumption that  observations  are  i.i.d.  Lee  (1989)  discussed  probability  matching  priors 
for  one  sided  and  symmetric  confidence  intervals  in  non  i.i.d  cases.  His  derivations  are 
heavily  based  on  Durbin's  work  (1980),  which  assumes  that  maximum  likelihood  esti- 
mators (MLE)  of  the  parameters  are  sufficient  statistics.  Such  a  requirement  usually 
does  not  hold  outside  of  the  exponential  family. 

Probability  matching  priors  for  specific  problems  are  investigated  by  quiet  a  few 
authors.  Ghosh,  Carlin  and  Srivastava  (1995)  derived  and  studied  a  class  of  first 
order  probability  matching  priors  and  a  complete  catalog  of  reference  priors  in  the 
univariate  linear  calibration  problem.  Ghosh  and  Yang  (1996)  derived  a  class  of  first 
order  probability  priors  for  two  sample  normal  problem,  among  these  priors,  a  prior 
is  recommended  such  that  the  marginalization  paradox  can  be  avoided.  Sun  and  Ye 
(1996)  considered  a  two-parameter  exponential  family  with  first  and  second  order 
probability  matching  priors.  Garven  and  Ghosh  (1996)  studied  probability  matching 
and  reference  priors  in  more  general  case,  namely  the  generalized  dispersion  models. 

1.2    The  Subject  of  This  Dissertation 

In  Chapter  2,  we  find  a  second  order  probability  matching  prior  and  a  one-at-a- 
time  reference  prior  which  works  well  for  Fieller-Creasy  problem  in  the  more  general 
setting  of  two  location-scale  models  with  smooth  symmetric  density  functions.  The 
properties  of  the  posterior  distributions  are  investigated  for  some  particular  cases 


5 


including  the  normal,  t,  and  the  double  exponential.  The  Bayesian  procedure  is 
implemented  via  Markov  Chain  Monte  Carlo  (MC2).  Our  simulation  study  indicates 
that  the  second  order  probability  matching  priors,  in  general,  perform  better  than  the 
reference  priors  in  terms  of  matching  the  target  coverage  probabilities  in  a  frequentist 
sense. 

Chapter  3  considers  the  generalized  slope-ratio  problem  (the  error  distribution  is 
assumed  to  be  a  symmetric  with  a  smooth  density  function).  An  orthogonal  transfor- 
mation is  found  by  following  Cox  and  Reid  (1987).  The  class  of  first  order  probability 
matching  priors,  the  references  priors,  Jeffery's  prior  and  a  class  of  second  order  prob- 
ability matching  priors  are  derived.  Among  the  second  order  probability  matching 
priors,  a  prior  which  does  not  depend  on  covariates  and  the  error  distribution  is  rec- 
ommended. Properties  of  the  posteriors  are  discussed  for  a  subclass  including  the 
normal,  t  and  double  exponential.  For  normal  case,  the  exact  forms  of  the  posterior 
distributions  are  derived,  the  design  issues  are  discussed,  i«ad  it  is  found  that  the 
posteriors  will  provide  more  correct  information  about  the  location  of  the  true  pa- 
rameter. These  ideas  are  illustrated  by  several  numerical  examples.  In  this  chapter, 
multiple  linear  regression  models  are  also  considered.  When  the  parameter  of  interest 
is  a  ratio  of  two  linear  combinations  of  the  coefficients,  a  sufficient  condition  involving 
the  design  matrix  is  given  such  that  an  orthogonal  parametric  transformation  exists. 
This  general  transformation  works  for  the  generalized  Fieller-Creasy  problem,  but  not 
for  the  slope-ratio  problem  unless  the  covariates  are  centered.  Under  this  sufficient 
condition,  a  second  order  probability  matching  prior  is  derived.  This  prior  does  not 
depend  on  the  choice  of  the  design  matrix  (covariates)  and  the  error  distribution,  the 
explicit  form  of  posterior  distributions  based  on  the  common  second  order  probability 
matching  prior  are  given  in  the  normal  case. 


6 


Chapter  4  considers  multivariate  linear  calibration  problems.  For  scalar  explana- 
tory variable  (unidimensional  multivariate  calibration  problems),  an  orthogonal  trans- 
formation is  found  and  the  complete  class  of  first  order  probability  matching  priors  is 
derived.  A  class  of  second  order  probability  matching  priors  is  also  derived  when  the 
error  covariance  matrix  is  known  or  is  known  up  to  a  scalar  variable.  It  turns  out  that 
the  prior  of  Hunter  and  Lamboy  (1981)  is  a  second  order  probability  matching  priors 
for  univariate  calibration  problem.  When  the  explanatory  variable  is  a  vector,  a  gen- 
eral class  of  first  order  probability  matching  priors  and  a  complete  catalog  of  reference 
priors  are  derived.  A  reference  prior  is  also  a  first  order  probability  matching  prior  if 
and  only  if  it  is  the  univariate  calibration  case.  The  marginal  posterior  densities  are 
derived  and  the  properties  of  the  marginal  posterior  densities  are  discussed. 

In  Chapter  5,  the  asymptotic  expansions  of  the  posterior  probability  in  general 
linear  regression  model  are  derived.  We  show  that  under  simple  regularity  conditions 
for  the  error  distribution  and  standard  conditions  for  the  design  matrix,  suitably  cen- 
tered and  scaled  posterior  probability  of  any  (p+2)  dimensional  Borel  set  possesses  an 

asymptotic  expansion  in  powers  of  n",  with  the  standard  multivariate  normal  as  the 
leading  term.  This  work  extends  the  results  of  Johnson(1970),  which  are  based  on  one 
parameter  family  of  distributions  in  the  independent  and  identically  distributed(i.i.d) 
case.  Strong  consistency  of  maximum  likelihood  estimators  of  the  parameters  are 
also  proved.  We  refer  to  Skovgaard  (1981)  for  the  validity  of  Edgeworth  assumptions. 
The  validity  of  probability  matching  priors  reviewed  in  Chapter  1  may  be  justified. 

In  Chapter  6,  we  summary  the  results  in  this  dissertation  and  propose  several 
topics  for  future  research. 


CHAPTER  2 

NON-INFORMATIVE  PRIORS  FOR  THE  GENERALIZED  FIELLER-CREASY  PROBLEM 

2.1  Introduction 

The  celebrated  Fieller-Creasy  problem  (Fieller,  1954;  Creasy,  1954)  involves  in- 
ference about  the  ratio  of  two  normal  means.  This  problem  has  posed  a  constant 
challenge  to  frequentist  and  likelihood  based  inference.  Fieller's  method  of  providing 
a  confidence  set  for  this  ratio  based  on  a  pivot  can  lead  to  two  disjoint  unbounded 
sets  or  even  a  whole  real  line.  As  pointed  out  by  Gleser  and  Hwang  (1987)  (See  also 
Berger,  Liseo  and  Wolpert  (1996)),  based  on  any  sample  of  arbitrary  (but  fixed)  size 
n,  a  confidence  interval  of  finite  expected  length  for  this  ratio  has  coverage  probability 
(taking  the  infimum  over  all  points  in  the  parameter  space)  equal  to  0.  On  the  other 
hand,  the  profile  likelihood  and  the  modified  profile  likelihood  (Barndorff-Nielsen, 
1983;  Fraser  and  Reid,  1989)  have  the  unpleasant  property  that  the  likelihood  is 
bounded  away  from  zero  when  the  parameter  goes  to  infinity  (see  McCullagh  and 
Tibshirani,  1990;  Liseo,  1993). 

Bayesian  analysis  for  this  problem  based  on  non-informative  priors  began  with 
Kappenman,  Geisser  and  Antle  (1970),  and  was  addressed  subsequently  in  Bernardo 
(1977),  Sendra  (1982),  Mendoza  (1987,  1996),  Stephens  and  Smith  (1992),  Liseo 
(1993),  Phillipe  and  Robert  (1995),  and  Berger,  Liseo  and  Wolpert  (1996). 

Liseo  (1993)  compared  profile  likelihood  and  its  modification  with  a  Bayesian 
analysis  based  on  reference  priors  (Berger  &  Bernardo,  1992a),  when  the  variance  is 
known.  Mendoza  (1996)  investigated  the  frequentist  coverage  probabilities  of  HPD 
intervals  for  reference  priors  through  simulations,  when  the  variance  is  unknown.  In 


7 


8 


general,  integrated  likelihood  approach  with  non-informative  priors  are  advocated 
by  Berger  el  at  (1996)  for  tackling  a  class  of  important  but  troublesome  problems, 
including  the  Fieller-Creasy  problem. 

This  chapter  considers  a  generalized  Fieller-Creasy  problem  which  involves  in- 
ference about  the  ratio  of  two  location  parameters  for  two  independent  symmetric 
location-scale  distributions.  Other  than  the  normal,  this  class  of  distributions  in- 
cludes also  the  t  and  the  double  exponential  distributions.  In  Section  2,  we  find 
second  order  matching  priors  (to  be  defined  in  Section  2)  as  well  as  two-group  and 
one-at-a-time  reference  priors.  An  interesting  feature  is  that  these  priors  remain 
the  same  for  every  'fiember  of  the  location-scale  family.  In  section  3,  we  study  the 
properties  of  the  posteriors  for  two  important  subclasses  of  distributions  including 
the  normal,  t  and  double  exponential  distributions.  The  normal  case  is  discussed 
at  some  length,  and  some  comparison  is  provided  for  posteriors  generated  by  sec- 
ond order  probability  matching  priors  with  those  that  are  already  available,  and  also 
with  likelihood  based  methods.  In  this  process,  we  have  also  corrected  a  result  of 
Liseo  (1993).  Implementation  of  the  Bayes  procedure  via  Markov  Chain  Monte  Carlo 
(MCMC)  integration  techniques  is  provided  in  Section  4.  Finally,  Section  5  contains 
some  discussion  and  concluding  remarks. 

2.2    Non-informative  Priors 

2.2.1    Probability  matching  priors 

Let  Xh,..  .  , :riri)2/ii,  •  •  •  , I/in  denote  random  samples  from  two  populations  with 
respective  pdf's  o~lf{^L)  and  cr_1/(£vH),  where  f(z)  =  f{-z).  The  parameter 
of  interest  is  9,  the  ratio  of  the  two  location  parameters.  W?  assume  the  regularity 
conditions  of  Johnson  (1970). 


9 


To  obtain  these  priors,  first  we  find  the  per  unit  Fisher  information  matrix  as 


/  2 
ci/r 


1(9,  H,  a)  =  a 


-2 


V 


cx9\x  o 

ci#/i  Cl(i  +  e2)  o 

o  0  c2 


where 


/oo  . 
[/  (x)/f(x)]2f(x)dx 
-00 

/oo 
x\f  (x)/f(x)ff(x)dx  -  1] 
-00 


Next,  we  find  the  orthogonal  parametric  transformation  for  9. 

Following  Cox  and  Reid  (1987),  9X  =  9X  {9,  /i,  a),  92  =  92(9,  pi,  a)  and  93  =  93(9,  a) 
is  an  orthogonal  parametric  transformation  of  (9,  //,  o)  if  and  only  if  the  corresponding 
9(9u92,93),pi(91,92,9z)  and  a(9u92,93)  satisfy: 


d  da 
h2{9,^o)—  +  I2z(9,n,o)—  =  -I2l(9,n,a) 


d  da 
h2{9,n,a)—  +  h3{9,  =  -hi(9,n,a) 

where  Uj(0,  //,  o)  is  the  (i,j)th  element  of  the  Fisher  information  matrix  1(9,  //,  a). 
These  equations  simplify  to 


dfx  9fx 


89,        1  +  92 


da_ 
d9x 


=  0 


(2.2.1) 


(2.2.2) 


One  solution  of  (2.2.1)  -  (2.2.2)  is  given  by  9  =  Oun  =  (1+*j)1/2,q- =  03. 


10 


Hence  an  orthogonal  parametric  transformation  is  given  by 

9X  =  6 

e2  =  (i(i  +  e2)> 
03  =  o 

which  leads  to  the  per  unit  Fisher  information  matrix 

W,e2,  ft)  =  e?Diag{c,dl{\  +  ^)"2,  cu  c2}  (2.2.3) 

Then  from  Tibshirani  (1989),  the  class  of  first  order  matching  priors  is  characterized 
by: 

7r(eue2,83)  oc  \e2\e;\i  +e2lr1g(e2,e3) 

where  g(-)  is  an  arbitrary  positive  function  of  $2  and  6$,  differentiate  in  each  argu- 
ment. 

The  function  g(-)  being  arbitrary,  there  are  infinitely  many  first  order  probability 
matching  priors.  Second  order  probability  matching  priors  narrow  down  the  selection 
within  this  class,  and  are  found  from  (2.10)  of  Mukerjee  and  Ghosh  (1996).  To  find 
this  prior  for  this  problem,  first  we  write  the  likelihood  function 


Then,  after  simplification, 


(2.2.4) 


Lw  ~  nE{  89,  ]  "°  (2-2-5) 
Ll12   -  -  _Cl  (TTWM  (  6) 


1  w<93  log  L(6>).     ,rt         .  0f 

L-  =  n^^pr)  =  (2ci+C3)(iT#M  (2-2-7) 


where  c3  =  - x(^M)/(x)dx 


11 


Now,  from  (2.10)  of  Mukerjee  and  Ghosh  (1996),  a  second  order  probability  match- 
ing prior  is  found  as  a  solution  of 

03^<?(02, 63)  +  C0200-S(02, 9-s)  =  0,  (2.2.8) 

where  c  =  (2cx  +  c3)/c2. 

A  general  class  of  solutions  to  (2.2.8)  is  given  by  9(62,83)  =  h[c8\  —  8\),  where 
h(-)  is  an  arbitrary  function  differentiable  in  both  82  and  83.  But  this  function,  in 
general,  will  depend  on  c  unless  it  is  a  constant.  Since  c  depends  on  the  particular 
pdf,  a  recommended  choice  is  9(82,83)  =  constant.  The  corresponding  second  order 
matching  prior  is  now 

7rro(0i,02,03)  ocl^Cl  +  fl?)-1,  (2.2.9) 

Remark  2.2.1  Due  to  invariance  of  probability  matching  priors  (Datta  and  Ghosh, 
1996;  Mukerjee  and  Ghosh,  1996),  the  second  order  matching  prior  in  the  (0,fi,a) 
parameterization  reduces  to 

H 


7rm(0,//,a)  oc 

a 


2.2.2    Reference  Priors 


Due  to  the  orthogonality  of  the  parameters,  following  Bernardo  (1979),  Berger  and 
Bernardo  (1989,  1992ab)  and  Datta  and  Ghosh  (1996),  using  rectangular  compacts 
for  8-1,62  and  83,  the  two-group  reference  prior  is  given  by 

7TS2(01,02,03)°C 


^(1  +  0?) 

while  the  one-at-a-time  reference  prior  is  given  by 


12 


Each  one  is  a  first  order  probability  matching  prior  by  proper  choice  of  g(92,93),  but 
none  is  a  second  order  probability  matching  prior.  Jeffreys'  prior,  given  by 

is  also  first  order  probability  matching  prior,  but  is  not  a  second  order  probability 
matching  prior. 

Remark  2.2.2  We  may  also  note  that  due  to  invariance  (  Datta  and  Ghosh,  1996), 
the  reference  priors  and  Jeffreys  priors  in  the  (0,  /x,  a)  parametrization  are  given  re- 
spectively by 

7rfl2(^,/i,a)aa-2(l+0"1/2 
7rR(0,/i,a)  <xa-\l  +e2)~1/2 
-KJ($,fj,,a)  oc  |/i|a~3 

Remark  2.2.3  In  the  special  case  when  a  (or  9$)  is  known,  the  unique  second  order 
probability  matching  prior  is  given  by:  7rm(0i,02)  oc  |#2|(1  +  9\)~l  which  is  the  same 
as  Jeffreys'  prior.  This  is  also  noticed  by  Mukerjee  and  Dey  (1993)  in  normal  case. 
The  reference  prior,  on  the  other  hand,  is  given  by  7rH(0i,02)  oc  (1  +9\)~x  which  is  a 
first  order  probability  matching  prior,  but  is  not  a  second  order  probability  matching 
prior. 


Remark  2.2.4  The  one-at-a-time  reference  prior  was  also  derived  by  Mendoza  (1996) 
in  the  normal  case. 


13 


2.3    Posterior  Analysis 

A  detailed  study  of  posteriors  under  the  proposed  priors  requires  specification 
of  /(•).  We  concentrate  on  two  important  families  of  distributions:  (i)  the  power 
family,  and  (ii)  the  t-family.  The  former  includes  the  normal  and  the  double  expo- 
nential distributions  as  special  cases.  The  latter  includes  the  normal  distribution  as  a 
limit.  For  each  family,  we  investigate  the  propriety  of  posteriors  for  a  general  class  of 
priors  which  includes  the  reference  prior  7rR,  Jeffrey's  prior  ttj ,  and  the  second  order 
matching  prior  7rm.  In  what  follows,  we  shall  write  x  =  (x\, . . .  ,  xn),  y  =  (y1} . . .  ,  yn). 

We  begin  with  the  power  family.  The  following  general  theorem  can  be  proved. 

Theorem  2.3.1  Let  f(z)  =  k(5)exp(—\z\s),  8  >  1,  where  k(S)  is  the  normalizing 
constant.  Consider  the  class  of  priors  7raia(#, /i,  cr)  oc|  \i  |°  a~a(l  +  O2)-1^,  0  <  a  < 
1,  a  >  0.  Then  the  posterior  7ra)Q(#,  /i,  o\x,  y)  is  proper  if  2n  +  a,  >  3. 

Proof  of  Theorem  2.3.1  The  joint  posterior  of  0,  fi  and  a  is  given  by 

I  n  n  1 

Tta,a{6,n,o\x,y)  oc  -^expHx-^  \x{  -  n\s  +  £  \Vi  -  6^\6})\n\a-  

°  »=i  «=i  (1  +  V  )  2 

Integrating  with  respect  to  a,  the  joint  posterior  of  6  and  ^  is  given  by 

i 


E£,i  to  -  A'  +  ELi  \vi  -  eti\*]<?n-i+°)/t{i  + 

Now,  letting  u  —       the  joint  posterior  of  u  and     is  given  by 


For  <  psc  the  inequality  E?=i  |ar<  -  >  ElUi  |*i  -  xmed\s,  where  xmerf  = 
median(xi,. .   ,xn)  while  for  |//|  >  §|x|,  use  the  inequality  ££=i  |*,--^|'  >  n|x  — /x|*. 


14 


Similarly,  for  |u|  <  §|y|,  use  the  inequality  Yh=\  \Vi  ~  u\s  >  £tn=1  \Vi  ~  Vmed\S,  where 
Vmed  =median (yu...  ,yn),  while  for  |tx|  >  use  the  inequality  £?=1  \Vi  -  u\6  > 
n\y  —  u\6.  Now  by  a  polar  transformation  of  coordinates,  one  can  verify  that  (/i2  + 
u2)'1^  is  integrable  when  |/.t|  <  and  |u|  <  Also  for  |/i|  >  and  |u|  >  ||£/|, 
|/i  —  x|  >  \\x\  and  \u  —  y\  >  ^\y\,  and  hence 


J\n\>%\x\  J\u\>% \y\ 

<  l-A*2  +  fT^  / ,  4   / ,   ai  (\t*  -  *l'  + 1«  -  y|5)"^^ 

4  ^H>§  1*1  •/M+>||f| 

-     l4V  ^  Ji  J  V+|«|'>min{i|*|'l|ff|'}VIPI 

<  [?(x2  +  y2)]-^i6r2/  /V*^+H 

4'  yra>min{i|i|* 


L  2  1—1   '2  1 


[cos1+*  (0)  sin-1+*  (0)  +  cos-1+*  (0)  sin1+*  (0)dr  d0 
<  oo 

if  2n  —  3  +  a  >  0.  Similar  arguments  can  be  used  when  <  |u|  >  f  |?/|  or 
| A*l  >  §|5|,  \u\  <  f      Theorem  2.3.1  follows. 


Remark  2.3.1  It  may  be  noted  that  the  property  does  not  depend  on  6  (>  1).  For 
0  <  a  <  1,  one  needs  n  >  2  for  proper  posteriors,  while  for  a  >  1,  one  needs  n  >  1 
for  proper  posteriors. 

Next  we  consider  the  t-family.  The  following  theorem  is  proved. 

Theorem  2.3.2  Assume  /  be  a  t  density  function  with  d.f  u  >  0.  Then  with  prob- 
ability one,  the  posterior  distributions  are  proper  for  the  class  of  priors  given  in 
Theorem  (2.3.1),      if  (n  -  2)u  >  2±a. 


15 

Proof  of  Theorem  2.3.2  For  7raiQ(0, /i,  <r)  oc  a~a\n\a(l  +  0)-I^\  the  joint  posterior  of 
6,  n  and  a  is  grvcn  by 

*.,«(*,  /*,  a\x,  y)  oc  f[{a-2[(l  +  ^^)(1  +  {Vi~Jf)\-^  }OHa(l  + 

I—  1 

Letting  u  =  9fx,  the  joint  posterior  of  u,  /z  and  a  is  given  by 
7r0j«(«,jtt,<r|x,y) 

«  ft{--2[(i  +  1^r!)(1  +  ^^)]-^K>2 + ^r^  (2-3.1) 

t=l 

Let  e  =  min{y,  1).  Since 


ni(i + (**  -  ^o-^-^i + (y<  -  „)  w1)]"*1)/2 

i=l 

>  [1  +  n{(/x  -  x)2  +  (u  -  y)2}^-2!/-1]^1)/2 

>  [1  +  n{(/z  -  xf  +  {u  -  y)2}ff-2i/-1]((+1)/2 
for  a  >  1,  the  right  hand  side  of  (2.3.1) 

1  1 


< 


Hence  it  is  easy  to  see  that  /f^  7rai(7(w,  /x,  a)du  d/i  da  <  oo,  if  2n  +  a  >  3. 

On  the  other  hand,  for  0  <  a  <  1, 

ni(i + (*i  -  m)v-'0(i + (is  -  u)*^-1] 


i=l 


>  IK1  +  to  -  /*)%  -  «)  w2] 


2=1 

n  i— 1 


>  i  +  2a-4*-2 EE  IK**- /<)2(?/* - «)S 

t=l  j=l  fcjttj 


16 


Hence  the  right  hand  side  of  (2.3.1) 

1  1  1 


< 


a2n+a  {1  +  2  E„=i  Ej-i         .  {Xk-y{yk-ur]H±  {u2  +  ^ 

1  1 


<  (72(n-2)(i'+l)-2n-Q 


Et,  e£  n*^  ^>y*-")2]^  (U» + /x2)1^ 


We  know  that  with  probability  1,  no  two  of  the  Xj  oi  the  t/j  are  equal.  Let 
Dij  =  0({xi,yj},5),  which  is  a  sphere  centered  at  {xi,yj}  with  radius  6,  8  being 
chosen  small  enough  such  that  Djj  D  =  </>,  for  all  i  ^  i'  or  j  ^  j  .  It  is  easy  to 
see  that  na,a{u,  n,a\x,  y)  is  integrable  under  D^,  0  <  a  <  1,  for  z,  j  =  1, 2, . . .  ,  n  and 
n^=1I>?-,  0  <  a  <  1,  if  2(n  -  2)(i/  +  1)  -  2n  -  a  =  2(n  -  2)i/  -  4  -  a  >  -1. 

Theorem  2.3.2  follows. 


A  detailed  analysis  is  given  in  the  normal  case.  We  compare  the  Bayesian  analysis 
with  the  likelihood  based  analysis.  This  generalizes  the  findings  of  Liseo  (1993). 

First,  we  address  the  likelihood  based  analysis.  When  /  is  standard  normal,  the 
profile  likelihood  for  6  is  given  by  (under  parameterization  (#i,#2,#3)) 

pZ(0i)  oc  S-n(0i)  (2.3.2) 

where 

S(d1)^S(el]x,y)  =  £a?i+±y>-n^±?jf 

i=\        »=i  1  + 0 

=   ±(Xi-xf  +  ±{yi-y)>+(±-??ln  (2.3.3) 
•=i  i=i  1  +  f 

This  is  maximized  at  $i  =  -x/y  and  minimized  at  9\  =  Hence,  p/(#i)  is  bounded 
away  from  zero  when  |0|  — >  oo.  Consequently,  the  confidence  interval  obtained  by 
inverting  the  profile  likelihood  test  statistic  could  potentially  be  the  entire  real  line. 


17 


The  modified  profile  likelihood  (Barndorff-Nielsen,  1983)  is 

d(02(0i), Wi)      [S{9i)]n  *\x  +  0iy\ 

where  I(e2,e3)(9\,  9;.{9X),  ^3(^1))  is  the  observed  Fisher  information  matrix  for  (#2>#3)> 
^2(^1))  9%{9\)  and  02,03  are  MLE  of  9^,9^  holding  6\  fixed  and  not  fixed,  respectively. 
As  same  as  for  pl(9\),mlp(9\)  is  bounded  away  from  zero  when  \9\\  — ¥  00.  The  MLE 

estimate  based  on  mpl{9\)  is  9  =  —x/y  ,  which  is  the  value  where  pl(9\)  is  minimized. 
So  mpl(9\)  performs  even  worse  in  this  case. 

The  conditional  profile  likelihood  (Cox  &  Reid,  1987)  is 

cpf^)  =pimi(e2,e3)(9h9^91),^(91)\-1^  oc  gjpgpi 

which  once  again  is  bounded  away  from  0  as  \9\\  — >  00.  Thus,  inversion  of  the 
conditional  profile  likelihood  ratio  statistic  can  also  lead  to  the  entire  real  line  as  the 
interval  for  9\ 

Under  the  original  parameterization  (0, //,  cr),  the  profile  likelihood  function,  mod- 
ified profile  likelihood  function  and  conditional  profile  likelihood  function  are  given 
by  pl{9),mpl{9){\  +  92)*,cpl(9)(l  +92)*,  respectively.  In  such  a  case,  modified  and 
conditional  profile  likelihoods  are  adjusted  in  a  wrong  direction.  There  seems  to  be 
no  way  to  define  a  likelihood  analysis  in  an  operative  way. 

When  /  is  a  standard  normal  density,  the  exact  forms  of  posterior  distributions 
exist  for  both  the  second  order  matching  prior  nm  and  one-at-a-time  reference  prior 
7rfi.  The  theorem  below  summarizes  the  results. 

Theorem  2.3.3  Under  the  second  order  matching  prior  %m  oc  ^  and  one-at-a-time 
reference  prior  irR  oc      ^  >  the  marginal  posterior  distributions  of  9  are 

nm(9\x,y)  o<  gm(9)— i— 


18 


and 


respectively,  where 


1  1       rW)\  i 


m=    nHx  +  ey) 


9R(0)  =  sW)'  and  S^  is  given  in  (2-3-3  )' 

Proof  of  Theorem  2.3.3  for  7rm  oc      the  joint  posterior  of  9,  fx  and  a  is  given  by 
7rm(0,  //,  or|x,  y)  oc         exp{--ijE(ar4  -  u)2  +         -  0/z)2]} 

a  zcr    i=l  i=l 

Integration  with  respect  to  a,  the  joint  posterior  of  0  and  \i  is  given  by 

I  til 


7rm(0,ti|x,y)   oc  t— 


[E?=i(*i-/02  +  E]Mv;- W 


[(l+02)(U-f±fg)2  +  S(0)]» 

Then 

7rm(0|X,y)   oc    /°°  d„ 

1        1     y00  k  +  rf(fl)|  , 
S"-H0)l+02-/-oc(z2  +  l])« 

/■-«*(*)    z  +  d(fl)         too    z  +  d(fl)   1_ 

V-oo         (*2  +  1)"       +  7-^9)  [z2  +  1]»      JS«-1(0)  1  +  02 

_   r   1  ,    K0)\    (W>\       1       ,  1 


L(l  +d2(0))»-1S»-1(0)     Sn~\6)  J-\d{0)\  (1  +  z2)»    Jl  +  02 


19 


For  irR  oc  — ^-j— ,  the  proof  is  similar  and  more  simply. 
Theorem  (2.3.3)  follows. 


It  is  easy  to  see  that  |d(0)|  is  increasing  in  gffi j ,  so  that  gm(6),gR(0)  and  pi(0) 

are  all  increasing  functions  of  ffi^i/l  •  Hence  gm(9),gR(0)  and  p/(0)  behave  very 
similarly  and  are  maximized  at  6  —  y/x  and  are  minimized  at  6  =  —x/y.  However, 
due  to  the  factor  (1  +  92)~1,nm(9\x,y),nR(6\x,y)  are  proper  density  functions,  thus  a 
credible  interval  for  9  under  the  second  order  matching  prior  or  the  reference  prior  will 
never  become  a  infinite  set.  Substituting  n  —  1  by  n  in  nm(9\x,  y),  we  get  the  posterior 
distribution  for  Jeffreys  prior.  Like  pl(9)  and  cpl(9),  all  the  posterior  distributions  are 
bimodal,  but  when  sample  size  is  moderately  large,  or  9  is  small,  one  of  the  modes  is 
far  out  in  the  tail.  This  is  evident  in  Figure  2.1.  Figure  2.1  shows  that  one  mode  is 
dominating  and  the  other  is  invisible  (locate  at  the  left  side  of  zero)  because  of  the 
large  difference  between  the  two  modes  in  terms  of  the  scaling. 

Lieso(1993)  derived  profile  likelihood,  modified  likelihood,  posterior  distributions 
of  8  under  the  reference  prior  and  Jeffreys'  prior  when  /(•)  is  normal  and  a  —  1.  In 
such  a  case,  Jeffreys'  prior  is  also  the  unique  second  order  matching  prior.  Liseo's 
derivation  of  the  posterior  distribution  of  9  under  Jeffrey'  prior,  however,  seems  to 
involve  an  algebraic  error.  The  error  occurs  in  -kj(9\x,  y)  in  p297  of  Lieso  (1993)  where 
+1  should  be  -1.  Also,  this  error  seems  to  have  carried  through  in  his  subsequent 
discussion  and  Figure  1  in  p298.  In  particular,  the  larger  mode  of  the  posterior 
distribution  under  Jeffreys'  prior  is  located  at  the  left  side  of  0,  when  it  should  actually 
be  on  the  right  side.  Our  Figure  2.2  gives  the  correct  picture  with  the  same  sample 
means  and  sample  size  as  Liseo  (1993).  This  figure  also  indicates  that  for  small  n, 
the  posteriors  based  on  7rm  and  -kr  are  very  close,  but  the  posterior  based  on  irJ  is 
quite  different. 


20 


Figure  2.1:  a.)  Posteriors  Based  On  Noninformative  Priors 


2.4    Simulation  Study 


We  compare  the  frequentist  coverage  probability  of  Bayesian  credible  intervals 
based  on  the  second  order  matching  prior  irm,  and  the  one-at-a-time  reference  prior 


21 


irR  when  /(•)  is  normal  or  t.  Since  when  /(•)  is  t,  no  closed  form  posteriors  are  avail- 
able, when  /(•)  is  normal,  the  posteriors  are  intractable,  thus  the  posterior  quantiles 
are  obtained  via  application  of  the  Markov  Chain  Monte  Carlo  (MCMC)  numer- 
ical integration.  We  provide  below  some  of  the  implementational  details  for  the 
t-likelihood. 

To  this  end  first  represent  a  t-density  with  location  parameter  //,  scale  parameter  a 
and  degrees  of  freedom  v  as  the  gamma  scale  mixture  of  a  normal  density.  Specifically, 
writing  //ll(T,„(/.)  for  such  a  pdf,  one  has 


WO  =  f  «2^>W[-^«  -  »)1>g^±M»  (2.4.1) 

We  denote  by  and  r2j  the  mixing  variables  associated  with  the  likelihoods  cor- 
responding to  the  Xi  and  the  yi  respectively  (i  =  1,...  ,n),  where  the  ru  and  r2j 
have  the  same  distribution  as  r  in  (2.4.1).  We  shall  write  r\  =  (rXi,...  ,rin),r2  = 
(f2i,.  •  •  ,r2„). 

Then,  with  the  prior  ir(6,fj,,a)  oc  cr~Q|/i|a(l  +  O2)'1^,  and  the  transformation 
u  —       the  joint  posterior  of  u,  fi,  a2,  T\  and  r2  given  the  x  and  y  is 

Tr(u,H,a2,rl,r2\x,y)   oc   (a2)'  "  ?    exp[-—^ru{xi-^)2-  —  ^r2i{yi-u)2} 

Jj[ea;p(--ri^)exp(--r2^)r1,:2  r2i2  ](/x2  +  u2)-— 


Figure  2.2:  b.)  Posteriors  Based  On  Noninformative  Priors 


-20  -10  0  10  20 

mean(x)=0.25,mean(y)=5  ,n=3,var(x)=1  ,var(y)=1 

This  leads  to  the  full  conditionals 

(i\u,  a2,  ri,r2,  x,  y  <x  exp[-^  E"=i  ru(«j  -  /*)2](/i2  +  u2)"1^; 
u|//,cr2,ri,r2,x,y  oc  earpf-^ £?=1  r2j(^  -  u)2](//2  +  u2)-1^; 
a-2|/.,  u,  n,  r2,  x,  y  ~  Gamma(^'  r"^-^a+S?-ri  r*^-">2 , 


23 


where  a  Gamma(b,c)  pdf  is  proportional  to  exp(—bz)zc  l; 
ru\u,  n,  a2,  r1:i(j  ^  i),  r2,  x,  y  ~  Gamma(v  +  (x'2~^? ,  ^f1); 
r2i|u,/z,  aV^r^O'  ^  i),x,y  ~  Gamma(v  +  <M2~^a ,  ^-); 

In  actual  implementation,  we  shall  consider  only  the  two  cases  a  =  0  and  a  =  1 
respectively.  With  a  appropriately  chosen,  this  corresponds  to  the  reference  prior, 
and  the  second  order  matching  prior  respectively. 

Because  the  conditionals  of  /z  and  u  given  the  rest  are  non-standard,  the  Metropolis- 
Hastings  algorithm  is  used  to  generate  samples  from  these  conditionals.  The  Metropolis- 
Hastings  algorithm  is  an  MCMC  method  that  has  been  widely  used  in  the  mathe- 
matical physics  and  image  restoration  communities,  but  only  recently  discovered  by 
statisticians.  We  refer  to  Chib  and  Greenberg  (1995)  for  reviews  of  recent  develop- 
ments for  this  method. 

In  each  case  we  computed  the  0.05th  and  0.95th  percent  posterior  quantiles  from  a 
sample  of  size  10,000  (discarding  the  first  5000)  and  repeated  the  iterations  1500  times 
to  estimate  the  coverage  probability.  We  ran  these  simulations  for  different  values 
of  (0,  //)  when  a2  =  \,n  —  5,10  and  v  =  1,10,  oo,  where  u  =  1,  and  oo  correspond 
respectively  to  the  Cauchy  and  normal  distributions.  The  results  are  given  in  Table 
1-6.  The  computing  was  done  on  Sun  SparclO  workstation  using  interface  between 
C  and  Splus.  All  random  numbers  are  generated  by  Splus.  In  the  normal  case,  these 
results  agree  with  the  output  obtained  directly  by  numerical  integration. 

2.5   Discussion  and  Conclusions 

The  performance  of  matching  the  nominal  quantile  points  are  similar  across  nor- 
mal, Cauchy  and  t  with  d.f.  =  10,  when  the  prior  is  ttr  or  nm.  This  suggests  the 
robustness  of  using  irR  and  7rm  for  different  members  of  /(•),  either  with  flat  tails  or 
with  heavy  tails. 


24 


It  is  clear  from  the  tables  that  7rm  performs  better  than  nR  in  matching  the  target 
coverage  probabilities.  This  is  intuitively  clear  since  7rm  is  a  second  order  matching 
prior,  but  irR  is  not.  For  both  normal  and  Cauchy  likelihoods,  with  the  prior  7rm, 
there  is  close  matching  except  for  small  With  itR,  matching  is  close  except  for 
small  |/i |  or  for  large  \$\. 

The  above  findings  are  best  understood  in  the  normal  case  where  closed  form 
expressions  for  the  posteriors  are  available.  Recall  nR{6\x,y)  oc  S-{-n-V{9){\  +02)~l. 
As  noted  already,  S~l(9)  attains  its  maximum  at  9  =  y/x.  Also,  for  large  samples, 
the  limiting  difference  S~n{6)\e=y/x  —  S~n {d)\\e\-oo  decreases  as  decreases,  and 
this  difference  d-jes  not  depend  on  the  true  value  of  9.  Thus,  the  posterior  becomes 
very  flat  for  small  |/x|,  leading  thereby  to  very  poor  frequentict  coverage  probabilities. 
Again,  for  large  samples,  since  n~1S~l(9)  behaves  like  a  constant,  the  factor  (1+02)-1 
pulls  the  posterior  mode  close  to  zero,  once  again  resulting  in  the  poor  frequentist 
coverage  probabilities. 

The  behavior  of  nm(9\x,  y),  on  the  other  hand  is  largely  dependent  on  \d(9)  \  which 
attains  its  maximum  at  9  =  y/x.  For  large  n,  d?(y/x)  converges  to  +  91)/2oq, 
where  /zo,  #o  and  oq  are  the  true  values  of  the  respective  parameters,  while  d2(±oo) 
converges  to  iiq9q/{2oI  +  /i2,).  Thus  cP(y/x)  —  (^(±00)  increases  in  the  limit  as  \9q\ 
increases,  and  decreases  as  |/xo|  decreases.  Hence  the  frequentist  coverage  probabilities 
of  nm(9\x,  y)  are  poor  for  small       but  are  not  affected  by  large  value  of  the 


25 


Table  2.1.  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  9 


when  n  =  5,  v  =  oo(normal) 


e 

0.1 

0.05  0.95 

1 

0.05  0.95 

10 

0.05  0.95 

100 
0.05  0.95 

0.1  7Tm 
1XR 

0.004  0.990 
0.004  0.997 

0.000  0.963 
0.001  0.980 

0.000  0.538 
0.000  0.449 

0.000  0.614 
0.015  0.523 

1  7Tm 
7TR 

0.025  0.964 
0.021  0.975 

0.015  0.948 
0.016  0.966 

0.001  0.949 
0.059  0.932 

0.005  0.944 
0.362  0.638 

10  7Tm 
7T* 

0.058  0.951 
0.037  0.949 

0.053  0.943 
0.046  0.949 

0.045  0.942 
0.067  0.932 

0.057  0.945 
0.274  0.700 

100  7Tm 
7TR 

0.049  0.950 
0.051  0.951 

0.047  0.939 
0.053  0.951 

0.043  0.957 
0.071  0.929 

0.041  0.946 
0.294  0.710 

Table  2.2.  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  6 


when  n  =  10,  v  =  oo(normal) 


6 

0.1 

0.05  0.95 

1 

0.05  0.95 

10 

0.05  0.95 

100 
0.05  0.95 

0.1  7Tm 

0.003  0.990 
0.001  0.993 

0.000  0.956 
0.001  0.981 

0.000  0.759 
0.000  0.638 

0.000  0.815 
0.041  0.531 

1  7TW 
7TR 

0.030  0.958 
0.035  0.970 

0.015  0.952 
0.041  0.959 

0.006  0.943 
0.071  0.923 

0.005  0.946 
0.267  0.672 

10  7Tm 
7TR 

0.056  0.955 
0.050  0.956 

0.048  0.953 
0.049  0.953 

0.052  0.941 
0.067  0.930 

0.059  0.961 
0.308  0.709 

100  7Tm 
7T« 

0.045  0.942 
0.053  0.945 

0.058  0.949 
0.047  0.953 

0.051  0.954 
0.072  0.932 

0.056  0.948 
0.281  0.709 

26 


Table  2.3.  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  6 


when  n  =  5,  v  =  10 


6 

0.1 

0.05  0.95 

1 

0.05  0.95 

10 

0.05  0.95 

100 
0.05  0.95 

0.1  7Tm 
7T« 

0.006  0.992 
0.003  0.990 

0.000  0.955 
0.001  0.968 

0.000  0.547 
0.000  0.445 

0.000  0.615 
0.013  0.493 

1  7Tm 
7T« 

0.019  0.970 
0.018  0.977 

0.006  0.940 
0.011  0.963 

0.002  0.945 
0.066  0.932 

0.002  0.945 
0.231  0.633 

10  7Tm 
7T« 

0.041  0.951 
0.044  0.942 

0.060  0.937 
0.049  0.950 

0.051  0.961 
0.082  0.938 

0.059  0.947 
0.277  0.695 

100  7Tm 

7r« 

0.046  0.941 
0.046  0.942 

0.055  0.943 
0.061  0.943 

0.055  0.957 
0.061  0.918 

0.046  0.962 
0.267  0.678 

Table  2.4.  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  9 


when  n  =  '. 

[0,u  =  10 

0 

0.1 

0.05  0.95 

1 

0.05  0.95 

10 

0.05  0.95 

100 
0.05  0.95 

0.1  7Tm 

0.005  0.993 
0.002  0.997 

0.000  0.955 
0.000  0.976 

0.000  0.715 
0.000  0.599 

0.000  0.789 
0.G';5  0.524 

1  7Tm 

0.043  0.961 
0.027  0.965 

0.016  0.948 
0  038  0.956 

0.003  0.946 
0.075  0.923 

0.003  0.943 
0.262  0.673 

10  7TW 
7T« 

0.049  0.955 
0.051  0.946 

0.055  0.948 
0.055  0.950 

0.049  0.955 
0.078  0.919 

0.051  0.951 
0.295  0.696 

100  7Tm 
7TR 

0.051  0.951 
0.051  0.943 

0.043  0.959 
0.047  0.945 

0.051  0.947 
0.077  0.922 

0.056  0.947 
0.283  0.716 

27 


Table  2.5.  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  9 
when  n  =  5,  v  —  1  


9 

0.1 

0.05  0.95 

1 

0.05  0.95 

10 

0.05  0.95 

100 
0.05  0.95 

0.1  7Tm 
7TR 

0.003  0.992 
0.002  0.994 

0.001  0.938 
0.001  0.970 

0.000  0.421 
0.000  0.343 

0.000  0.341 
0.014  0.355 

1  7Tm 
7T« 

0.017  0.973 
0.016  0.985 

0.005  0.948 
0.007  0.967 

0.004  0.939 
0.044  0.865 

0.001  0.944 
0.216  0.616 

10  7Tm 
7TR 

0.044  0.951 
0.051  0.945 

0.050  0.952 
0.045  0.942 

0.055  0.953 
0.084  0.912 

0.047  0.945 
0.305  0.717 

100  7Tm 

irR 

0.049  0.955 
0.056  0.969 

0.056  0.957 
0.057  0.958 

0.053  0.953 
0.075  0.912 

0.043  0.954 
0.294  0.709 

Table  2.6.  Frequentist  coverage  probabilities  of  0.05(0.95)  posterior  quantiles  of  9 


when  n  =  1 

0,1/  =  1 

9 

0.1 

0.05  0.95 

1 

0.05  0.95 

10 

0.05  0.95 

100 
0.05  0.95 

0.1  7Tm 
7TR 

0.000  0.994 
0.003  0.993 

0.000  0.953 
0.001  0.977 

0.000  0.562 
0.000  0.449 

0.000  0.591 
0.035  0.419 

1  7Tm 
7TR 

0.023  0.962 
0.019  0.979 

0.011  0.947 
0.021  0.945 

0.004  0.943 
0.083  0.909 

0.004  0.947 
0.245  0.640 

10  7Tm 
7T« 

0.045  0.958 
0.052  0.948 

0.051  0.957 
0.049  0.943 

0.047  0.952 
0.101  0.897 

0.057  0.957 
0.307  0.702 

100  7Tm 
7TR 

0.052  0.953 
0.051  0.956 

0.051  0.951 
0.049  0.943 

0.050  0.941 
0.088  0.915 

0.051  0.945 
0.277  0.697 

CHAPTER  3 

NONINFORMATIVE  PRIORS  FOR  SLOPE-RATIO  PROBLEMS 

3.1  Introduction 

The  problem  of  estimating  a  ratio-type  parameter  (for  example,  the  ratio  of  two 
means  in  Chapter  2)  arises  naturally  in  the  context  of  bioassay.  A  bioassay  involves  a 
stimulus,  with  several  levels  (doses),  applied  to  a  subject.  The  responses  are  measured 
and  are  used  to  produce  a  suitable  description  of  the  dose-response  relationship.  One 
usual  aim  in  bioassay  is  to  assess  the  relative  potency  of  two  different  stimuli  (a 
comparative  assay).  If  a  direct  assay  is  considered,  the  parameter  describing  the 
relative  potency  is  the  ratio  of  two  means.  In  the  case  of  an  indirect  assay,  continuous 
responses  are  measured  under  the  most  common  model  (two  straight  lines  with  a 
common  intercept),  the  parameter  of  interest  being  the  ratio  of  two  slopes.  We  refer 
to  Finney  (1978)  and  Mendoza  (1990)  for  some  of  the  references  related  to  direct  and 
indirect  assay. 

The  classical  approach  to  the  statistical  analysis  of  these  ratio-type  problems 
has  the  common  drawbacks  as  given  in  Fieller  (1954),  Creasy  (1954).  On  the  other 
hand,  likelihood  based  approach  will  also  encounter  difficulties  in  producing  interval 
estimates.  We  refci  to  Gleser  and  Hwang  (1987)  ,  Liseo  (1993),  Berger,  Liseo  and 
Wolpert  (1996). 

Mendoza  (1990)  discussed  Bayesian  analysis  for  the  slope- ratio  problem.  Under 
the  normality  assumption,  he  derived  the  two-group  reference  prior  (  Barnard  1979, 
Berger  and  Barnard,  1989,1992ab)  and  compared  the  corresponding  credible  intervals 
with  the  confide  fee  intervals  (regions)  via  Fieller's  theorem. 


28 


29 


Mendoza  (1988)  studied  a  more  general  ratio-type  problem,  namely  that  of  making 
inferences  about  the  ratio  of  linear  combinations  of  the  coefficients  in  a  multiple  linear 
regression  model  (under  normality  assumption).  This  includes  the  ratio  of  two  means, 
the  ratio  of  two  slopes  as  particular  cases,  and  also  includes  the  problem  considered 
by  Darby  (1980)  in  the  parallel  lines  model.  Mendoza  considered  a  class  of  non- 
informative  priors  with  major  emphasis  on  the  reference  priors.  The  class  of  non- 
informative  priors  considered  by  him  includes  neither  Jeffreys'  prior,  nor  the  second 
order  probability  matching  priors  to  be  defined  in  Section  3.2. 

In  this  Chapter  we  revisit  these  ratio-type  problems,  also  from  a  Bayesian  point  of 
view  using  non-informative  priors,  particularly  the  probability  matching  priors.  The 
error  distribution  is  assumed  to  have  a  symmetric  and  smooth  density  including  but 
not  limited  to  the  normal  distribution.  In  Section  3.2,  we  consider  the  generalized 
slope-ratio  problem.  First,  we  find  an  orthogonal  transformation  and  the  class  of 
first  order  probability  matching  priors.  Second,  we  derive  a  second  order  probability 
matching  prior  which  does  not  depend  on  the  error  distribution  and  the  values  of 
covariates  used.  Jeffreys'  prior  and  one-at-a-time  reference  prior  are  also  derived  in 
this  Section. 

Section  3.3  studies  the  properties  of  the  posteriors  for  a  subclass  of  error  distri- 
butions which  includes  the  normal,  t  and  double  exponential  distribution.  For  the 
normal  case,  the  exact  forms  of  the  posterior  distributions  is  derived.  Some  com- 
parison is  provided  for  posterior  coverage  probabilities  based  on  the  second  order 
probability  matching  priors  with  those  based  on  Jeffreys'  prior  and  the  reference  pri- 
ors. The  design  issues  are  also  stressed  for  the  normal  model  such  that  the  posterior 
will  provide  more  useful  information  about  the  location  of  the  true  parameter.  Several 
numerical  examples  are  discussed  in  section  3.4. 

Section  3.5  considers  the  multiple  linear  regression  model.  When  the  parameter 
of  interest  is  a  linear  combination  of  the  coefficients,  we  find  that  a  unique  second 


30 


order  probability  matching  prior  exists.  When  the  parameter  of  interest  is  the  ratio 
of  two  linear  combinations  of  the  coefficients,  a  sufficient  condition  for  design  matrix 
(covariates)  is  given  such  that  an  orthogonal  parametric  transformation  exists,  and 
further,  a  common  second  order  probability  matching  prior  also  exists.  The  explicit 
forms  of  posterior  distributions  are  given  for  the  second  order  probability  matching 
prior. 

3.2    Slope-ratio  Problem 

Consider  an  experiment  where  p  doses  (in,...  ,Xip)  of  a  first  stimulus  and  q 
doses  (x2\,  ■  ■  ■  ,x2q)  of  a  second  stimulus  are  assayed  n  times  so  that  a  set  {yuk  i  — 
1, . . .  ,p\  k  =  1, . . .  ,  n;  y2jk  j  —  1,  •  •  •  ,  q\  k  —  1, . . .  ,  n}  of  n(p  +  q)  observations  are 
obtained.  The  assumed  model  is 

Viik  =  oc  +  pxu  +  euk  k  =  l,...,n;  i=l,...,p;  (3.2.1) 
V2jk   =   a  +  ppx2j  +  e2jk    k  =  l,...,n;     j  =  l,...,q-  (3.2.2) 

where  eijfc,e2jfc  are  i.i.d.  with  density  functions  ^/(^),  /(•)  is  a  smooth  symmetric 
density  function  which  satisfies  the  regularity  conditions  of  Johnson  (1970).  To  avoid 
identification  problems,  xu,i  —  1, ...  ,p  are  always  chosen  such  that  at  least  two  of 
them  are  distinct,  x2j,  j  =  1,. ..  ,q  are  always  chosen  such  that  at  least  one  of  them 
is  not  zero. 

Model  (3.2.1)  -  (3.2.2)  includes  the  one  considered  by  Mendoza  (1990)  who  as- 
sumed the  error  distribution  to  be  N(0,a2).  The  usual  normality  assumption  in  the 
bioassay  is  supported  by  Finney  (1978),  who  wrote  on  page  24  in  his  book:  "No 
other  parametric  formulation  of  the  distribution  has  even  as  strong  a  theoretical  jus- 
tification as  the  normal  .  Unless  experimental  evidence  strongly  indicates  a  specific 
alternative  as  desirable,  to  adopt  one  is  analytically  more  complicated  without  any 
compensating  advantage."  But  when  a  alternative  distribution  other  than  the  normal 


31 


is  specified,  direct  use  of  this  information  would  be  beneficial.  Also  the  analysis  would 
be  feasible  by  using  the  Markov  Chain  Monte  Carlo(MCMC)  procedures. 

In  this  chapter,  the  parameter  of  interest  is  p,  which  is  the  slope  ratio  of  two 
simple  linear  regression  models.  This  ratio  is  invariant  with  respect  to  the  scale 
transformation.  In  the  application  for  bioassay  problem,  this  parameter  is  associated 
with  the  relative  potency  of  two  different  treatments  (stimuli). 

3.2.1    Probability  matching  priors 

The  likelihood  function  L(p,a,0,a)  based  on  (  3.2.1)  and  (  3.2.2)  is  given  by 

°  k=\  t=l  °  j=\ 

Then  the  per  unit  (a  single  experiment  without  replication)  Fisher-information  matrix 
I(p,  a,  (3,  a)  is  then  given  by 

(  a^2E?=i4  aiPZUxV  dpPZ^xlj  0 

a\PY!j=iX-2i  ai{p  +  q)  oi(Ef=i^if  +PE?=i«2j)  o 

V  0  0  0  a2(p  +  q)  J 


^2 


where 


02  =  1  -  /"  z2(4tt)2/(^)^  (3.2.3) 

Next,  following  Cox  and  Reid  (1987),  an  orthogonal  parametric  transformation  p  = 
0i, a  =  ar(0i,02,03,04),/3  =  P(91,e2,03,64),a  =  <t{0i,02,O3,O4)  is  derived  by  solving 
the  following  differential  equations 


32 


(p  +  <?)^  +  (E  XU  +  p  Y,  =  -P  E  *V  (3-2-4) 

°P      i=i  j=i       °P  j=i 


(E  *«  +  p  E  ^)^  +  (E  4  +  ^2  E  4)^7  ^     E  4 

<=i         j=i      cp     i=1  i=1      cp         i=1  ^3  2  5^ 


^  +  ^=0  (3.2.6) 


cr2  <9p 

One  solution  of  above  equations  is  given  by 


P  =  0i 

0  =  92h-*(9x) 

x2.9\  +xh  02 
a  =  #3  j  

P  +  <7  /i*(0x) 

a   =  04 


where 


and  xt.  =  £<=i  ar^,  x2.  =  E'=i 
Equivalently, 


01   =  p 

X2.p  +  Xi.Q 

v$   =   a  H  p 

p  +  g 

04  =  a 


33 


Under  the  new  parameterization  8  =  (8\,  82,  83,  #4),  the  log-likelihood  function  logL(#) 
log  L(0j ,  62, 83, 9A)  is  given  by 


-n(p  +  g)  log ^4   +    2JLlog/(  a  — ) 

fc=i  i=l 

+  E'og/(- — T      ^  )i 

The  per  unit  Fisher  information  matrix  under  the  new  parameterization  6  =  (8\,  82,  #3,  84) 
is 

el  .1.1.1 


m  =  dia9®ejhkrr  d2W  diW  d4el}' 


where 


,      ((p  +  g)  E?=i  4  -  4)((p  +  g)  ELi  Ai  -  A)  -  AA 

d2   =   ax ; 

^3   =   Mp  +  g); 

d4   =   a2(p  +  ?),  (3.2.8) 

ai,a2  being  defined  in  (3.2.3).  Then  from  Tibshirani  (1989),  the  class  of  first  order 
probability  matching  priors  is  characterized  by: 


7r(0l,02,M4)  «  JT7k<7(02,03, 9A) 

where  g(-)  is  an  arbitrary  positive  function  of  62,83,  and  84,  differentiate  in  each 
argument. 

To  narrow  down  the  choice  of  the  first  order  probability  matching  priors  (infinitely 
many),  we  consider  second  order  probability  matching  priors.  This  is  accomplished 


34 


by  solving  the  differential  equations  in  (2.10)  of  Mukerjee  and  Ghosh  (1996).  After 
some  algebra,  we  have 


aiogL(fl)  3  _  n 

Iw    -E{  )  -0 

x        i  dHogL(e)  e2 

-L\\2     =  —  £ — 0„,n„        —  —dv 


n    wide*  eih2^) 

a3  log  L(6) 
-l114   -  £    Q/l2  Q/1 — -a3- 


where 


as  =  dl(2  _  J — 0^A£^£})  (3  2-9) 

ai 

ai  and  d\  being  defined  respectively  in  (  3.2.3)  and  (  3.2.8). 
Now,  (2.10)  of  Mukerjee  and  Ghosh(1996)  simplifies  to 

of29(02,  e3,  e4)  +  c—g{e2,  e3,  eA)  =  o  (3.2.10) 

where  c  =  -^J,  a3  and  dhi  =  1,2,4  are  given  in  (  3.2.9)  and  (  3.2.8). 

A  general  class  of  solutions  is  given  by  9(62,63,64)  =  t(c$l  -  64,63),  where  t(-) 
is  an  arbitrary  smooth  function.  As  same  as  that  in  Chapter  2,  t(-)  depends  on  the 
choice  of  /(•)  (though  c),  unless  t[cB\  -  6\,Q%)  =  l(63),  where  l(63)  is  an  arbitrary 
smooth  function  of  63. 

The  corresponding  second  order  probability  matching  priors  are  now 

^{6x^,63,64)  ot-^-l^) 


35 


Equivalently,  the  corresponding  second  order  probability  matching  priors  under  the 
original  parameterization  are 

nip,  a,  0,  a)  oc  — /(a  H  /?) 

v  7      a  v  p+q 

Since  these  second  order  probability  matching  priors,  in  general,  depend  on  x%  and 
X\.  unless  /(•)  =  constant,  the  recommended  second  order  probability  matching  prior 
is 

nm{p,a,p,a)  oc  J£L  (3.2.11) 
a 

3.2.2    Reference  priors 

Following  Bernard(1979),  Berger  and  Bernardo(1989,  1992ab),  using  rectangular 
compacts  for  01,62,63,  and  64,  the  two-group  reference  prior  is  given  by 

7rRx(p,  a,/3,a)  oc    1  \  , 

while  the  one-at-a-time  reference  prior  is  given  by 

nR{p,a,  A  a)  oc  -j—  . 

Each  one  of  the  above  priors  is  a  first  order  probability  matching  prior  by  proper 
choice  of  g(-),  but  none  is  a  second  order  probability  matching  prior.  Jeffreys'  prior, 
given  by 

7rJ(p,a,(3,a)  oc  ^ 
a 

is  also  first  order  probability  matching  prior,  but  is  not  a  second  order  probability 
matching  prior. 

Remark  3.2.1  In  the  actual  derivation  of  the  reference  priors,  we  first  derive  the  refer- 
ence priors  under  the  orthogonal  parameterization.  This  simplifies  some  calculations. 


36 


The  reference  priors  under  the  original  parameterization  are  derived  consequently 
by  the  usual  Jacobian  calculus.  The  invariance  of  reference  priors  under  one  to  one 
parameterization  is  discussed  in  Datta  and  Ghosh  (1996). 

Remark  3.2.2  Unlike  the  second  order  probability  matching  prior  7rm  oc  the  refer- 
ence priors  depend  on  the  choice  of  the  covariates  xu,i  =  1, ...  ,p;x2j,j  =  1, ...  ,q. 
In  particular,  they  put  the  biggest  weight  at  p*  =  ,    ,  &-x*  2 — r,  and  gradually 

reduce  the  weight  when  p  is  away  from  p*. 

Remark  3.2.3  The  two-group  reference  prior  was  also  derived  by  Mendoza  (1990)  in 
the  normal  case. 

3.3    Posterior  Analysis 

As  in  Chapter  2,  a  Bayesian  analysis  for  a  general  density  function  /(•)  would  be 
difficult.  We  concentrate  on  two  important  families  of  distributions:  (i)  the  power 
family,  and  (ii)  the  t-family.  The  former  includes  the  normal  and  the  double  expo- 
nential distributions  as  special  cases,  while  the  latter  includes  the  normal  as  a  limit. 

For  each  family,  we  investigate  the  propriety  of  posteriors  based  on  a  class  of  priors 
Ka,a{p>ac,P,(r)  oc  |/3|oa-,4(/i(p))~V-}0  <  a  <  l,u  >  0,  which  includes  the  reference 
prior  7rfl,  Jeffreys'  prior  irJ,  and  the  second  order  probability  matching  prior  7r,n 
as  particular  ca-.as  by  proper  choice  of  a  and  u.  In  what  follows,  we  shall  write 
Hi  =  (2/111 ,  -  •  •  ,  Vipn),  U2  =  (2/211,  •  •  •  ,  y2qn)-  We  begin  with  the  power  family. 

Theorem  3.3.1  Let  f(z)  =  k(Q  exp{-|z|<},  £  >  1,  where  k(Q  is  the  normalizing 
constant.  Consider  the  class  of  priors 

7ra,«(p,a,/J,<7)  oc  |/?|V-u(/i(p))-^,0  <  a  <  1,«  >  0. 
Then  the  posterior  7ra,u(p,  a,  0,  a\yuy2)  is  proper  if  n(p  +  q)  +  u  >  5. 


37 


Proof  of  Theorem  3.3.1  The  proof  is  similar  to  that  of  Theorem  (2.3.1)  in  Chapter  2. 

Remark  3.3.1  It  may  be  noted  that  the  propriety  of  posteriors  do  not  depend  on  £. 
All  one  needs  is  n(p  4-  q)  >  4  so  that  the  posteriors  based  on  the  reference  prior  nR, 
Jeffreys  prior  irJ  and  the  second  order  probability  matching  prior  7rm  are  all  proper. 

Next  we  consider  the  t-family. 

Theorem  3.3.2  When  /(•)  is  t  density  function  with  d.f.  v  >  0,  then,  the  posterior 
distributions  are  proper  for  the  class  of  priors  given  in  Theorem  (3.3.1) 
if 

i      •  r     i     «w      i  \     n(p  +  q)  +  u-l 
(nmm{p,q}-2)(v  +  l)  >    ^    ^  . 

Proof  of  Theorem  3.3.2  The  proof  is  similar  to  the  proof  of  Theorem  (2.3.2)  in  Chap- 
ter 2. 

When  /(•)  is  a  standard  normal  density,  the  explicit  form  of  the  posterior  distri- 
bution of  p  based  on  the  second  order  probability  matching  prior  irm  is  available.  The 
result  is  given  in  following  theorem. 


38 


To  this  end,  first  we  need  the  following  notations.  Let 

n     p  n  q 

bxyi     =2^1^  yukXii  ;  , 

«        f  f  (yi-  +  ^/2-)^2- 

^xy2    -  2_,  2^  2/2j*^2j  — —  , 

fe=i  »=i  P  +  9 


S      2      ((p  +  9)n  +  m  -  4) 


^  )      =  ^  _  (p^xy2  +  gxyl)2(P  +  g) 

n/i(p) 


S1  -  c(p)  i 
,  .  e(p)         /-e(p)  1 

^W)      -  2  (p+ffr+GT  /.  ,  (p+^+u-2  «W 

[c(p)J       2        -70      (1  4-  W1)  2 
Also  recall  that  h(p)  =  p*[(p  +  q)  £j=1  x\  -  x\\  -  2pxhx2.  +  (p  +  q)  E?=i  *ii  ~  «?. 

Theorem  3.3.3  Under  the  class  of  priors  tta  oc  >  0,  the  marginal  posterior 

distribution  of  p  is  given  by 

^MvuVt)  oc  h~\p){du  +  wu(p)} 


Proof  of  Theorem  3.3.3         The  joint  posterior  of  p,  a,  (3  and  a  is  given  by 
*£Gt»,<*,0i<r|»i,«a) 

«       n  in  exP(-  (y"t ' 0Xu? }  n  exP) - <"»  -  °;  } 

\P\   ,   n(p  +  q),_    y{..  +  2/2-  -        +  pX2-),2i 


exi'(  (a  )  } 


Q-n{p+q)-ru  2<72      V  r)(P  +  Q) 

£-h{p)f-2p(pSxy2  +  Sxyl)  +  S 
BXP{   2^  -} 

Integrating  with  respect  to  a,  the  joint  posterior  of  p,  (3  and  a  is  given  by 

?Tu(p>  (3,<j\yi,y2) 

III  f   £-qh(p)p-2P(pSxy2  +  Sxyl)  +  S  i 

Integrating  with  respect  to  a,  the  joint  posterior  of  p  and  /?  is  given  by 

n£(p,0\yi,V2)  oc  — —  — i^!  ,n(p+q)+u_2 

[£-qh{p)P  -  2p(pSxy2  +  Sxyl)  +  S]JE±S^- 


[^/i(p)(^_£^^)2  +  c(p)]^^- 
Next,  by  integrating  out  of  ft,  the  marginal  posterior  of  p  is  given  by 

K  (P\Vh  V2)  oc  /    1   n(p+,)+u_2  dp. 


Let 


_(/J_^1)[^Wc.I(p)li 


40 


then 


*"u(p|l/iil/2) 


[-^-^pjc-^p)]-**!; 


{c(p)}      2    ~£-h(p)  J-°°  {w2  +  1) 


Now, 


/  _  /  o  ,  ->  -,"f. "+■/)  +  "- 2 
■/  —  00  I  _j_ :|  )  o 


e(p)  1 

+  2e(p)  /   ~7~t~tt~ — z"dw. 


2e(p)  f 

JO 


[1  +  e^f**^  n(p  +  q)  +  u-4]     KHJ  Jo      n  +  ^2)^^ 
Hence  Theorem  (3.3.3)  follows  by  noticing  c(p)  +  e2(p)c(p)  =  5 


Remark  3.3.2  In  Theorem  (3.3.3),  u  =  1  and  u  =  4  correspond  to  the  second  order 
probability  matching  prior  and  Jeffreys'  prior,  respective!}'. 

Theorem  3.3.4  Under  the  class  of  priors  nB  oc  -r-1 — ,«  >  0,  the  marginal  posterior 

h*(p)o*  &  ^ 

distribution  of  p  is  given  by 

7TB(P|2/1,  2/2)  OC  /l~1(p){c(p)}-"(B+',2+U"3 


41 


Proof  of  Theorem  3.3.4  The  joint  posterior  of  p,  a,  f3  and  a  is  given  by: 
nu(P>  a>  Pi  a\V\i  2/2) 

sw^^ntnexpf  ^ — }nexp{  ^ — » 


—  exp{  — : — (a  J  } 

h2{p)an^h«       1       2a2    v  p  +  q  '  ; 

exP{-^  ^  •} 

Integrating  with  respect  to  a,  the  joint  posterior  of  p,  /?  and  a  is  given  by 

1                    ^(p)/32-2/3(p5iy2  +  5iyl)  +  5 
oc  — ;  expl  — *  .\ 

^i(p)(7»(P+9)-H.-l      Fl  2<72  J 

Integrating  with  respect  to  a,  the  joint  posterior  of  p  and  /?  is  given  by 

(A/%,2/2)    oc  — ^—  n(p+q)+a^W, 

1  1 


n(p+^)  +  u-3. 
2 


Now,  by  integrating  out  of  j3,  the  marginal  posterior  of  p  is  given  by 

J.  1 

*   MP)  {c{p))!iiE±^^ 

Remark  3.3.3  In  Theorem  (3.3.4),  u  =  1  and  w  =  3  correspond  to  the  one-at-a-time 
and  two-group  reference  priors,  respectively. 


42 


Remark  3.3.4  In  a  slightly  different  context,  Mendoza(1988)  also  derived  the  marginal 
posterior  distributions  of  p  based  on  a  class  of  priors  ix  oc  u(p)-^,  u  >  0,  where  u(p)  is 
an  arbitrary  function  of  p.  This  class  of  priors  does  not  include  either  Jeffreys'  prior 
or  any  second  order  probability  matching  priors. 

Remark  3.3.5  One  may  recognize  that  (see  also  Mendoza  (1990))  S  —  c(p)  is  precisely 
the  pivotal  quantity  used  in  the  classical  approach  to  obtain  a  confidence  region  for 
p  via  Fieller's  theorem,  which,  in  this  case,  coincides  with  likelihood  ratio  based 
confidence  region.  Also,  it  is  easy  to  see  that  c(p)  is  the  core  element  for  the  profile 
likelihood  function  of  p. 

Next,  we  discuss  the  behavior  of  the  posterior  distributions  and  address  the  de- 
sign issues  for  covariates  Xu,  i  —  1, . . .  ,p  and  x2j,  j  =  1, . . .  ,  q,  so  that  the  posteriors 
provide  more  accurate  information  about  the  location  of  the  true  parameter  po-  The 
function  c(p)  in  both  Theorems  (3.3.3)  and  (3.3.4)  plays  a  critical  role  in  determining 
the  shapes  of  all  those  posterior  distributions.  It  is  easy  to  see  that  e(p)  in  The- 
orem (3.3.3)  is  monotonically  decreasing  in  c(p),  and  the  same  is  true  for  wu(p). 
Thus,  the  second  factor  in  the  posterior  distributions  given  in  both  Theorem  (3.3.3) 
and  (3.3.4)  all  have  similar  shape  as  that  of  c_1(p),  which  is  maximized  at 


Pmax  — 


SXy2{{P  +  q)  Ef=l  x\j  ~  xj}  +  SxylXhX2. 

Sxyi{(p  +  q)  £j=i  xlj  -xl}  +  Sxy2xi.x2. 


(3.3.1) 


and  is  minimized  at 


Pmin  — 


(3.3.2) 


Now, 


p 


xv  +  x2.p0 

p  +  q 


a.e.. 


43 


1  ae  q 

n  j=l 


X\.  +  x2.po 
p  +  q 


(3.3.3) 


where  (po,  #0,  A),  &o)  are  the  ^rue  parameters.  Hence, 


a.e. 


Pmax 


and. 


Pmin 


a.e., 


When  n(p  +  q)  is  large,  the  second  factors  (du  +  wu(p)  in  Theorem  (3.3.3)  and 


butions.  Thus,  heuristically,  the  modes  of  those  posterior  distributions  converge  to 
the  true  parameter  value  p0  as  n  — >  oo. 

On  the  other  hand,  from  a  design  point  of  view,  a  good  strategy  is  to  minimize 
the  Bayes  risk  (  under  some  loss  functions).  Since  the  posterior  means  under  both 
the  second  order  probability  matching  prior  and  the  reference  priors  are  infinite,  the 
only  proper  choice  of  loss  function  is  the  usual  0  or  1  loss  function,  which  leads  to 
maximize  the  posterior  mode.  Also,  since  there  are  no  analytical  solutions  for  the 
posterior  modes,  and  c-1(C)  dominates  the  shape  of  the  posteriors  as  the  sample  size 
becomes  large,  we  propose  an  alternative  strategy,  which  is  to  make  \pmax  —  Pmin]  to 
be  as  large  as  possible.  Asymptotically,  this  is  equivalent  to  making 


{c(p)} 


n(p+g)+u-3  , 


in  Theorem  (3.3.4)  )  will  dominate  the  shape  of  the  posterior  distri- 


ct) + 


Po        x\j  +  Sf=i  x\j 
Po  Zuj=i  x2j 


as  large  as  possible.  If  xx.  =  0,  x2.  =  0,  then  the  above  simplifies  to 


(3.3.4) 


44 


Hence  a  good  design  is  to  choose  £"=1  xft  as  large  as  possible,  while  letting  Y%=i  xlj 
as  small  as  possible. 

This  is  evident  in  Figure  3.  Figure  3  plots  two  posterior  distributions  based  on  the 
probability  matching  priors  7rm  for  two  simulated  sets  of  data,  given  respectively  in 
Tables  3.1  and  3.2.  In  design  1,  Xu,i  =  1,  2,3,  are  randomly  taken  from  AT(0, 10)  and 
£2j,j  =  1,2,3,  are  randomly  taken  from  7V(0, 1),  while  in  design  2,  Xu,i  =  1,2,3,  are 
randomly  taken  from  N(Q,  1)  and  X2j,j  =  1,2,3,  are  randomly  taken  from  iV(0, 10). 
It  is  clear  that  design  1  provides  much  more  information  about  the  correct  position 
of  the  true  parameter  po  =  10/3  than  that  given  by  design  2. 

3.4    Numerical  Examples 

In  order  to  see  the  performance  of  the  non-informative  priors  derived  in  Section 
3.2,  we  analyze  two  sets  of  data  using  the  reference  priors  ,  the  probability  matching 
priors  and  the  Jeffreys'  prior.  The  posterior  quantile  points  (at  level  0.025th,  0.05th, 
0.95th,  0.975th)  are  computed  and  are  compared  with  the  limits  of  the  confidence 
interval  (region)  via  Fieller's  theorem.  Also  the  corresponding  posterior  probabilities 
of  the  Fieller's  confidence  regions  are  computed. 

We  begin  with  the  first  set  of  data,  related  to  an  assay  of  riboflavin  in  malt,  given 
in  Table  3.3.  It  was  analyzed  by  Finney  (1978,  P.161)  using  the  classical  approach 
and  by  Mendoza  (1990)  using  the  Bayesian  approach  with  the  two-group  reference 
prior. 

For  this  set  of  data,  Fieller's  95%  confidence  interval  obtained  by  Finney  was 
(0.6464,  0^235),  From  Table  3.4  we  can  see  that  the  0.025th  and  0.975th  percent 
quantile  points  corresponding  to  both  nm  and  irR  match  Fieller's  confidence  limits 
upto  3  decimal  places,  while  for  other  priors,  the  matching  is  also  up-to  two  decimal 
places.  The  posterior  probability  of  the  interval  (0.6464,  0.7235)  for  7rm  is  closer 
to  the  nominal  level  (0.95%)  than  the  other  priors.  This  is  not  surprising  because 


45 

7rw  is  a  second  order  probability  matching  prior  while  other  priors  are  all  first  order 
probability  matching  priors.  However,  as  shown  in  Figure  4,  there  is  virtually  little 
difference  in  the  numerical  results  based  on  these  priors. 

Figure  3.1 :  Posteriors  Comparsions  Between  Two  Designs 


o 

C\J 

o 
o 

6 


o 
o 


o 
o 


o 
d 


o 
d 


Table  3.1.  Design  1 


Simulated  Data(n  =  4,p  =  q  =  3,  a  =  2, /3  =  0.3,  p/3  =  1, 
a  =  l,xu  ~  7V(0,10),x2i  ~  7V(0,1)) 


Stimulus 
dose 

1  2 

-0.047   1.693  0.749 

1.864    10.560  -1.810 

2.242    2.048  0.845 
3.546    2.324  1.979 
2.322    4.343  2.582 
3.318    3.144  2.032 

2.768  12.044  -0.394 
4.186  9.319  0.318 
4.998  14.253  1.169 
4.472    13.889  1.771 

Table  3.2.  Design  2 


Simulated  Data(n  =  4,p  =  q  =  3,a  =  2,ft  =  0.3,  p/3  =  1, 
a=l,xu  ~  N(0,l),x2j  ~  W(0,10)) 


stimulus 
dose 

1  2 

7.756    -1.010  -8.096 

5.284   -5.783  -7.104 

4.501    1.290  -1.792 
5.171    0.853  0.479 
4.503    0.124  0.611 
4.365    2.510  -1.115 

6.730  -3.507  -4.403 
6.754  -4.431  -3.558 
7.992  -2.763  -5.625 
8.683    -4.498  -5.839 

47 

Table  3.3.  Responses  in  an  assay  of  riboflavin  in  malt 
(n  =  4,p  =  3,g  =  2) 


Stimulus 
Dose 

1  2 

0     0.5  1.0 

0.5  1.0 

38    97  167 
45    100  164 
40    105  159 
44    98  156 

80  121 
88  1 2^ 
90  122 
82  122 

Figure  3.2:  Posteriors  Based  On  Finney's  Data 


48 


Table  3.4.  Posterior  quantiles  and  probabilities  based  on  Finney's  data 


-Po.025 

^0.05 

^0.95 

■fo.975 

F{0.6464, 0.7235} 

0.6462 

0.6529 

0.7169 

0.7238 

0.9489 

7TR 

0.6461 

0.6529 

0.7169 

0.7238 

0.9486 

TT*2 

0.6485 

0.6548 

0.7149 

0.7213 

0.9608 

7TJ 

0.6495 

0.6556 

0.7141 

0.7203 

0.9658 

Figire  3.3:  Posteriors  Based  On  Simulated  Data 


49 


Table  3.5.  A  simulated  data  for  slope-ratio  problem 
Simulated  data  (n  =  3,  p  =  q  =  4,  a  =  2,  (5  =  .3,  p/3  —  1 ,  a  =  1 ) 


Stimulus 

1  2 

Dose 

.5       1.5       2.5  3.5 

1.0       2.0       3.0  4.0 

2.613    2.690    2.509  5.398 
1.946    3.672    .986  3.830 
2.805    3.601    2.535  2.209 

2.700    3.714    4.519  4.550 
3.010    4.881    5.517  7.244 
3.024   6.035    6.090  5.397 

In  the  second  example,  we  reanalyze  the  simulated  dataset  of  Mendoza  (1990), 
given  in  Table  3.5.  Mendoza  calculated  that  the  Fieller's  95%  confidence  region  is 
(—oo,  —3.068)  n  (1.530,  oo),  which  is  practically  useless  since  it  provides  even  no  clue 
about  which  side  of  the  real  line  the  true  parameter  would  lie.  The  posterior  based 
on  the  Jeffreys'  prior  (irJ),  reference  priors  (irR,  nRt)  and  the  probability  matching 
prior  (nm)  are  plotted  in  Figure  5.  They  are  all  bimodal,  but  the  mode  in  the  right 
side  dominates  for  each  case.  nm(p  >  0|?/i,j/2)  —  0.8944, irRl(p  >  0|2/i,t/2)  =  0.8863, 
ttR2{p  >  0\yuy2)  =  0.8981, nJ(p  >  0|j/i,y2)  =  0.9102.  Furthermore,  the  dominating 
modes  are  given  by 

pm  =  2.241,  pR  =  2.211,  pRi  =  2.261,  pj  =  2.311 

so  that  the  posteriors  all  provide  very  clear  information  about  the  true  value  of  the 
parameter  p  —  10/3. 


3.5    Multiple  Linear  Regression 

The  problems  of  estimating  two  means  in  Chapter  2  and  estimating  the  slope-ratio 
in  this  Chapter  can  be  generalized  to  the  problem  of  estimating  a  ratio  of  two  linear 
combinations  of  the  coefficients  in  multiple  linear  regression  .  The  model  is  given  by 

Yk=X0  +  ek  (3.5.1) 


50 


where  %  =  (Vki,---  ,  2/fcm)>&  =  l,.--  ,n  are  observed  responses,  X  is  the  design 
matrix,  =  (efci,  •  •  •  ,  ffcm)  and  e^,  fc  =  1, . . .  ,  m  are  i.i.d  with  density  function  ^/(^)- 
The  unknown  parameters  in  the  multiple  linear  regression  model  is  /3  =  (/?i, . . .  ,  {3P) 
and  a. 

The  parameter  of  interest  is        where  Ai,  A2  are  p  dimensional  known  vectors. 
Let  Q  =  (Ai,  A2,  . .  ,  XpY  be  a  nonsingular  matrix.  Then  (3.5.1)  can  be  written  as 

yfc  =  XQ^QP  +  tk 

=  Z7i  +  ek  (3.5.2) 

where 

Z  =  XQ-1 

and 

r)  =  Q/3  =  (r)U...  ,7?p)*. 
Now,  the  parameter  of  interest  becomes 

For  above  model  (with  parameter  of  interest  Mendoza  (1988)  considered  a 
class  of  non-informative  priors  with  major  emphasis  on  the  reference  priors.  Our  first 
objective  here  is  to  derive  the  class  of  the  first  order  probability  matching  priors.  We 
are  interested  in  comparing  the  class  of  the  first  order  probability  matching  priors  with 
the  class  of  non-informative  priors  considered  by  Mendoza(1988).  Furthermore,  we 
are  interested  in  investigating  the  conditions  under  which  the  second  order  probability 
matching  priors  exist. 

We  should  point  out  that,  when  m  is  fixed,  the  order  of  probability  matching  is 
corresponding  to  n.  In  such  a  case,  the  standard  i.i.d  assumption  is  satisfied.  Follow- 
ing the  lines  of  Johnson(1970),  Bickel  and  Ghosh(1990),  Murkerjee  and  Ghosh(1996), 
the  validity  of  probability  matching  priors  can  be  justified.  On  the  other  hand,  when 
n  is  fixed,  the  valid  justification  for  probability  matching  priors  need  some  assump- 
tions on  /(•)  and  also  on  the  design  matrix  X  (see  Chapter5),  in  this  case,  the  order 


51 

of  probability  matching  is  corresponding  to  m  instead  of  n.  In  what  follows,  we  will 
assume  m  is  fixed.  But  the  results  are  fairly  the  same  when  n  is  fixed. 

To  facilitate  the  derivation  of  the  probability  matching  priors,  we  need  to  find  a 
orthogonal  parametric  transformation.  Let 

The  log-likelihood  function  under  the  tp  =  (^1,  •  •  •  ,ipP,&)  parameterization  is 

The  one  unit  Fisher  information  matrix  I(ip\,  xp2,  ■  ■  ■  ,  IS  then  given  by 

\diag{aiM,  a2} 


a2 


where 


M  =  (Mij)PxP, 


with 


M 


m 

Mu  =  M2i  =  ipi  ^{fozn  +  zi2)za; 

m 

My   =  MjX  =  fa  Y,  znzip     for  j  =  3, . . .  , p; 

i=l 

m 

M22   =   ai  ^2(znfa  +  zi2)2; 
i=\ 

m 

M2j    =   Mj2  =  ^(^1-01  +  za)zij      for  j  =  3,...  ,p; 

m 

Mkj    =    S  ZV  "  Zik       for      j  =  3,  .  .  .  ,  p. 


i=1 


52 

One  may  recall  that  a:  =  -  /  **$&f(x)dx  and  a2  =  1  -  /  ^d&x2  f(x)dx. 

Next,  let  ipi  =  6i,ip2  =  ^2(^1,- ••  ,6P),...  ,xpp  =  ipp(6i,...  ,6p),a  =  a.  Then 
0  =  (#i , . . .  ,  6P,  a)  is  an  orthogonal  parametric  transformation  if  and  only  if  equations 
in  (4)  in  Cox  and  Reid  (1987)  are  satisfied.  These  equations,  in  our  context,  are  given 
by 


2j*iii  V'i  +  ^2)        +  2^(^iV'i  +  ^2)^3-^7-  +  •  •  •  +  zJ^i  +  zi2)zip— 


i=\ 


t=l 


=  -^^(V'i^i +  ^2)^1 


E/       ,     ,        n2      ^V2  V"    2  ^"^3    .  , 

(*«     +  Zi2)  +  2-*za—  +  ...  +  ^  zi3zip  ^ 


a,    .      ,2  a     5^3        A  2a^P 

2Jz« ,  Vi  +  ^2)  Zip-^j-  +  2^  ZiPZi3~d^  +  •  ■  •  +  2^  ziP^ 


m 


Finding  a  general  solution  for  above  equations  would  be  difficult,  if  not  impossible. 
However,  one  may  easily  see  that  under  the  conditions 


=  0  (3.5.3) 

i=l 


k  =  1,  j  =  3,...  ,p;  j  =  l,k  =  3,...  ,p;  and  k  ^  j  =  3, . . .  ,p, 


53 


an  orthogonal  transformation  exists,  which  is  given  by 


</>2     =  02/(I>tl01+Zi2)2)* 
i=l 

^3    =  03 


0p  =  op 

a   =   a  (3.5.4) 

Again,  the  log-likelihood  function  L(0)  under  the  parameterization  6  =  62, .  ■  ■  ,  0P,  a) 
is 

*=it=i  " 

log  a} 

The  one  unit  Fisher  information  matrix  is,  after  some  algebra,  given  by 

7(0)  =  diag{gug2,...  ,gP,gp+i} 


where 


ai022(Er=i  4  Em  j  =  14  -  S£i  *i*«) 
^  (E^i(^i0i+^2)2)V2 


52       =  -o 


9j     =ai      2     ,J  =  3,...  ,p 


a2 
a2 


54 


a,i  and  a2  being  defined  in  (3.2.3).  Further,  after  some  calculations, 


LliU^(M)3  =  o 
112      de\de2  {Y™M^  +  zi2f)2°2 


Lnj  =  0,j  =  3,...  ,p 


110*1)=  (£™1(ztl0i+^)2)2a3 


where  a3  =  d1(2  — /  xf(x)dx/a,i),  ai  and  di  are  defined  respectively  in  (3.2.3) 

and  (3.2.8). 

Hence,  following  Tibshirani  (1989),  the  class  of  first  order  probability  matching 
priors  is  characterized  by 

£i=l(Zil01  +  ^2)^ 

where  g(-)  is  an  arbitrary  function  differentiable  in  each  argument. 
Now,  (2.10)  in  Mukerjee  and  Ghosh  (1996)  simplifies  to 

d  d 

— g{92, . . .  ,  9P,  a)  +  c—g{Q2,  ...,0p,o)  (3.5.6) 

where  c  =  t^th — 2  — r^ra  r— • 

A  class  of  solutions  for  (3.5.6)  is  given  by 

g(62,  • .  •  ,  0P, a)  oc    (c02  -  a2,  03,  •  •  •  ,  0,)  (3.5.7) 

where  //(-)  is  an  arbitrary  smooth  function  differentiable  in  each  argument.  Once 
again,  a  subclass  the  solutions  (  which  does  not  depend  on  /(-)  through  c  and  design 
matrix  X  through  Z  )  is: 

0(02>...  ,0p,a)  oc*(03)...  ,9P)  (3.5.8) 


55 

where  t(-)  is  an  arbitrary  smooth  function  differentiable  in  each  argument.  Then  the 
corresponding  subclass  of  the  second  oder  matching  priors  is 

By  invariant  properties  of  probability  matching  priors  (Datta  and  Ghosh,  1996), 
equivalently,  under  the  parameterization  rj  =  (771, . . .  ,  r)p,  a), 

7^(77)  oc^rfe,...  ,vP)-  (3.5.9) 
a 


Further,  under  the  originally  parameterization 


a 


The  recommended  second  order  matching  prior,  which  does  not  depend  on  the  choice 
of  A},  j  =  3, . . .  ,p,  is  given  by 

7rm0#,a)oc^!  (3.5.H) 
a 

The  above  findings  .ire  summarized  in  the  theorem  below. 

Theorem  3.5.1  For  model  (3.4),  with  the  parameter  of  interest  jj^,  there  exists  a  sec- 
ond order  probability  matching  prior,  which  does  not  depend  on  the  error  distribution 
/(•),  the  design  matrix  X,  and  the  choice  of  Xj,  j  =  1,3,...  ,p.  This  prior  is  given  by 

7T   {p,a)  OC  , 

a 

if 

(Q-'YX'XQ-1  =diag{G,r3,...  ,rp}  (3.5.12) 
where  Q  =  (Ai,  A2, . . .  ,  \p)\  G  is  any  2x2  matrix,  rj,  j  =  3, . . .  ,p  are  scalars. 
Froof  of  Theorem  3.5.1  This  is  already  proved  through  above  steps. 


56 


Corollary  If  XtX  =  tlp,  where  t  is  an  arbitrary  real  value,  then 

a 

is  a  second  order  probability  matching  prior  for  parameter  of  interest 

Proof  choose  A3,...  ,  Ap,  such  that  A'Aj  =  0, AjAj  =  0, A* Aj  =  1  for  i  =  3, . . .  ,p 
and  AjAfc  =  0  for  j  ^  k  =  3,. . .  ,p.  Then  Q~l  =  (D,  A3, . . .  ,  Ap)  with  D%  =  0,  j  = 
3,...  ,p.  Thus,  condition  (3.5.12)  holds.  The  Corollary  follows. 

Remark  3.5.1  It  is  ready  to  see  that  the  probability  matching  priors  derived  in  Chap- 
ter 2  (generalized  Fieller-Creasy  problem  )  can  be  derived  directly  from  the  Corollary 
as  a  particular  case.  The  parametric  transformation  given  in  (3.5.4),  on  the  other 
hand,  is  an  orthogonal  transformation  for  slope-ratio  problem  (Section  3.2-3.3)  if  and 
only  if  X\.  —  0  and  x-i-  =  0.  Hence  the  conditions  given  in  (3.5.12)  of  Theorem  (3.5.1) 
is  a  sufficient  but  not  necessary  condition  to  guarantee  an  orthogonal  parametric 
transformation. 

Remark  3.5.2  Many  designs  satisfy  XtX  =  tlp.  The  usual  2k  factorial  designs  also 
satisfy  this  condition. 

The  one-at-a-time  reference  prior,  on  the  other  hand,  is  obtained  by  Mendoza(1988). 
Under  our  context,  it  is  given  by 

r        1  1 


where  H(p)  is  a  quadratic  function  of  p  =  ^  which  depends  on  X  and  = 
1, . . .  ,p.  With  the  condition  in  (  3.5.12),  H(p)  =  G22p2  -  2Gl2p  +  Gu,  where  G  = 
G\\  G 


r12 


y  G2\  G22 


is  defined  in  Theorem  (3.5.1).  In  this  case  H(p)  puts  the  maximum 


57 


weight  on  p*  =  ,  and  the  weight  is  reduced  when  p  moves  away  from  p*.  The 
performance  of  the  posterior  based  on  ttr  will,  in  general,  depend  on  the  choice  of  X 
and  Xj,  j  =  I,...  ,p.  One  of  this  kind  of  effects  has  been  seen  in  Chapter  2  through 
the  simulation  results. 

On  the  other  hand,  the  probability  matching  prior  7rm  oc  ^S.  puts  less  weight 
when  Xl/S  is  close  to  0,  and  puts  more  weight  when  |A^|  is  large.  The  difficulties 

*  A^  3 

of  estimating  p  =       happen  when       tends  to  be  very  small.  Hence  7rm  seems  to 

provide  the  right  remedy  for  this  problem.  This  is  also  seen  in  the  simulation  studies 
in  Chapter  2. 

When  /(•)  is  normal,  an  explicit  form  of  the  posterior  distribution  based  on  the 
prior  7r  a  J-£-i  is  available.  To  simplify  the  notations  in  what  follows,  we  assume 
n  =  1  in  model  (3.21).  This  includes  the  case  (with  appropriate  replication  in  X) 
considered  above. 

We  need  a  few  new  notations  before  stating  the  result.  Let  Z  =  XQ~l,  where 
Q  =  (Ai,  A2, . . .  ,  Xp)*  is  a  nonsingular  matrix,  rj  =  {ZtZ)-xZtY  =  Q(XtX)~1XtY  is 
the  least-square  estimates  of  r\  =  Q/3.  Let 

'■(Z'zy 

v 

and  En  be  the  2  x  2  up-left  corner  diagonal  matrix  of  E,  where  v  —  m-p. 


Y\IP  -  X{XtX)-lXt)Y ,  _r 


58 


Further,  let 


"*r(g±g) 

7rfr(|) 


«(p)   =  (p,!)^!1 


V 


b(p)   =  (p,l)En 


c(p) 
e(P) 


-u\p)b\p) 


Theorem  3.5.2  Under  the  multiple  linear  regression  (3.5.1),  the  posterior  distribution 
°f  P  ~  iff  (^i>  ^2  are  known  p  dimensional  real  vectors)  based  on  the  prior  n  oc 
is  given  by 


Trm(p\x,y)  oc 


u(p)\E 


2 


111 


[*'  +  Wi,%)£n 


rir~+^(^i(|e(p)l)_2}^iij 


{fa) 


where  Tu+1  is  the  £  distribution  function  with  d.f.  =v  +  l. 

Proof  of  Theorem  3.5.2  Under  the  parameterization  (r),  a),  the  corresponding  prior 
is  7r  oc  1,  the  joint  posterior  distribution  of  r)  and  a  is  proportional  to 


59 


ir(T],a\Y)  oc 


1  c-^(Y-Zr,)HY-zn) 


a 


m+l 


?_c-iJrt(5r-*4)'(y-^)+(<H»)*«'*(9-«)] 


rm+l 


Integrating  with  <r,  the  joint  posterior  distribution  of  77  is  proportional  to 


n(r)\y)  oc 


1 


cx 


[(y  -  Zr?){y  -  Zfj)  +  (rj  -  -  77)]* 

1 

[m  -  p  +  (77  -  77)*S— 1  (77  -  77)]  m~2P+P 


Noticing  that  the  above  form  has  the  structure  of  multivariate  t  density  with  d.f: 
v  =  m  —  p,  location  vector  77  and  scale  matrix  E,  then  the  joint  posterior  density 
function  of  77  is  given  by 


kp\t\- 


+  (77  —  r))*^-1  (77  -  77)]  2 


Hence,  the  joint  posterior  distribution  of  771  and  772  is  given  by  (cf.  pl36-137,  Press, 
1982) 


x{V\im\y)  = 


k2\Zu 


[v  +  ( 


{V2j 


m 


[V2  ) 


rh     _  (3.5.13) 


Now,  the  problem  of  deriving  the  posterior  distribution  of  p  =  ^  becomes  the  problem 
of  deriving  a  £-ratio  distribution.  This  is  also  discussed  by  Press  (1969).  By  changing 
variable  in  (3.5.13)  letting  (77!,  772)  =  (pr,r),  the  joint  posterior  density  of  p  and  r  is 


60 


given  by 
n{p,r\y) 


k2\tu\  2\r\ 


[v  + 


pr 
r 


y-l 


pr 
r 


\ihj) 


,v+2 


k2\tn\  l\r\ 

[v  +  (p,l)£uX 

M 
K1) 

r«-2(p,l)Er11 

,  v+2 
J  2 

folEnl  »|r| 
[a(r  —  t>)2  4-  c]  a" 


(3.5.14) 


Let  w  =  c  (r  —  6),  By  changing  variable  in  (3.5.14)  and  integrating  over  w,  the 
marginal  posterior  of  p  is  given  by 


 7~i — TTi±l  dw~ 

oo         (ur  +  1)  2 


f  +  2 
C  2 


fc2|Sn  |     2      /-00    |w-(-6c  2^2 


C2U         J-00    (tu2  +  l)  2 


dw 


k2\t 


111 


a. 

C2U 


•[/: 


?     _1  i. 

w  —  OC    2  M  2 

(u/2  +  l)^ 


city  + 


/OC) 
-6c_2"u^ 


_I  1 


w  +  be  2U2 
(t//2  +  1)  2 


dw 


e»u      (l  +  62c_1u) 


r-rS-  +  2\bc  »u»  /  ^s^dw] 

iU)2V  Jo 


/^ISlli  2 


ft 

[«+(ft,ft)sri 


+ 


2|e(p)|  / 
./o 


(1+W2)  2 

Kp)I  1 


aw 


(1+W2)  2 


Theorem  3.5.2  follows. 


CHAPTER  4 

THE  MULTIVARIATE  LINEAR  CALIBRATION  PROBLEM 

4.1  Introduction 

Calibration  problem  involves  making  inferences  about  a  fixed  but  unknown  ex- 
planatory variable  C  corresponding  to  a  response  variable  Y.  Since  the  appearance 
of  Krutchkoff's  (1967)  controversial  paper  in  Technometrics,  a  large  number  of  pa- 
pers have  appeared  on  calibration  problem.  Univariate  Bayesian  calibration  has  been 
discussed  by  Hoadley  (1970)  and  Hunter  and  Lamboy  (1981).  Hoadly  showed  that 
Krutchkoff's  (1967)  inverse  regression  estimator  can  be  interpreted  in  a  Bayesian  way. 
Hunter  and  Lamboy  (1981),  on  the  other  hand,  tended  to  provide  Bayesian  support 
for  the  classica'  •  ^timator  (cf.  Eisenhart,  1939).  However,  the  prior  of  Hunter  and 
Lamboy  was  criticized  both  by  Hill  (1981)  and  Lawless  (1981).  In  particular,  Hill 
criticized  the  authors  for  introducing  a  prior  "with  no  motivation  and  no  attempt  to 
understand  either  its  implications  or  its  relationship  with  otner  prior  distributions". 

Multivariate  calibration  originated  with  Brown(1982).  Many  subsequent  papers 
put  efforts  on  finding  a  good  confidence  region.  Brown's(1982)  exact  confidence  region 
is  invariant  (under  nonsingular  linear  transformation),  but  may  not  be  an  ellipsoid 
and  may  even  be  empty.  Oman's  (1988)  confidence  region  is  always  nonempty,  but 
it  is  conservative  and  is  not  invariant  in  general.  Mathew  and  Kasala's  (1994)  exact 
confidence  region  is  nonempty  and  invariant,  however  it  may  not  be  an  ellipsoid. 
Mathew  and  Zha  (1996)  give  a  conservative  confidence  region  which  is  nonempty  and 
invariant.  How  to  construct,  an  exact,  invariant,  ellipsoidal  and  nonempty  frequentist 
confidence  region  for  future  C  is  still  an  open  problem.    The  difficulty  of  finding 


61 


62 


a  good  frequentist  confidence  region  is  because  that  the  model  represents  a  curved 
exponential  family  when  dimension  of  response  variable  Y  is  larger  than  the  dimension 
of  explanatory  variable  X,  and  also  the  problems  raised  when  the  slope  (3  is  close  to 
zero. 

Ghosh,  Carlin  and  Srivastava  (1995)  discussed  a  Baysian  approach  to  univariate 
calibration  with  the  reference  priors  and  the  first  order  probability  matching  priors. 
They  show  that  Jeffreys  (1961)  non-informative  prior,  as  well  as  the  one  used  by 
Hunter  and  Lamboy  (1981),  are  first  order  probability  matching  priors.  In  this  chap- 
ter, we  extend  their  work  by  finding  the  second  order  probability  matching  priors, 
and  studying  the  multivariate  linear  calibration  problems. 

Section  2  considers  the  multivariate  linear  calibration  (the  explanatory  variable  £ 
is  a  scalar).  The  complete  class  of  first  order  probability  matching  priors  is  derived. 
Furthermore,  when  the  covariance  matrix  is  known  or  known  up  to  a  scalar,  second 
order  probability  matching  priors  are  also  derived.  In  particular,  for  univariate  linear 
calibration  case,  it  turns  out  that  the  prior  of  Hunter  and  Lamboy  (1981)  is  a  second 
order  probability  matching  prior.  The  marginal  posterior  distribution  correspond- 
ing to  the  second  order  probability  matching  prior  is  derived  and  its  properties  are 
discussed. 

Section  3  considers  multivariate  linear  calibration  problem  (the  explanatory  vari- 
able C  is  a  vector).  A  class  of  first  order  probability  matching  priors  and  a  complete 
catalog  of  reference  priors  (including  Jeffreys'  prior)  is  found.  It  turns  out  that  all 
reference  priors  are  first  order  probability  matching  priors  and  Jeffreys'  prior  is  a  first 
order  probability  matching  prior  only  in  unidimensional  calibration  case.  Marginal 
posterior  distributions  of  £  corresponding  to  reference  priors  are  derived,  and  their 
properties  are  discussed. 


63 


4.2    Multivariate  Linear  Calibration  -  ("  is  a  scalar 
4.2.1    First  orck-  probability  matching  priors 

In  multivariate  linear  calibration  problem  is  a  scalar),  the  calibration  experi- 
ment can  be  represented  as 

Yi  =  a  +  fa  +  €t  i  =  1, . . .  ,n  (4.2.1) 

the  prediction  experiment  can  be  represented  as 

Zj  =  a  +  0(  +  e'j  j  =  1, . . .  ,  m  (4.2.2) 

where  Yi(p  x  l),Z,(p  x  1)  are  real  observed  values,  X{(p  x  1)  are  fixed  explanatory 
variables,  Cj(px  1)  and  e^(px  1)  are  i.i.d.  with  density  function  N(0,  E),  a(px  l),/3(px 
l),E(pxp),(  (a  scalar)  are  unknown  parameters. 

In  this  problem,  the  parameter  of  interest  is  C,  an  unknown  explanatory  variable 
for  future  responses.  The  parameters  a,  {3,  and  E  are  nuisance  parameters  .  Without 
loss  of  generality,  we  assume  Ya=\  x%  —  0-  We  may  notice  that  the  slope  ratio  model 
(3.2.1)-(3.2.2)  in  Chapter  3  includes  the  univariate  calibratiOii  (p  —  1).  This  can  be 
seen  by  letting  x2j  =  1  and  n  —  1  in  model  (3. 2.1)- (3. 2. 2).  Hence  the  results  derived 
in  Section  3.2-3.3  apply  readily  to  the  univariate  calibration  (p  —  1).  But  in  general, 
this  is  not  true  for  multivariate  linear  calibration  problems. 

The  likelihood  function  L(£,a,  f3,  E)  based  on  (4.2.1)  and  (4.2.2)  is  given  by: 


exp  {-^E(yi-a-^«)<E_1(^-«-^) 

1  <=i 


-i  m 

"     ^E^-a-^O'E-1^.  -a-/?C)} 


64 

By  solving  partial  differential  equations  as  in  (4.)  in  Cox  and  Reid  (1987),  after 
some  algebra,  we  get  an  orthogonal  transformation  : 

0i  =  c 

777 

m  +  n 

e3  =  puHo 

E  =  E 

where 

n 

Cxi  ^""1 2>j 

t=l 

«(C)   =   cxx  +  -^-e  (4.2.3) 


Equivalently, 


m  +  n 


C  =  *i 

rs        m  6i 

a  =  02  ;  03 

m  +  nu$(0i) 


E  =  E 


Now,  the  likelihood  function  L(#i,  02, 63,  E)  under  the  new  parameterization  is  : 
^exp{  -  I^(yt-02  +  ^-^03U-i(^)-w-|(^1)e3xi)tE-1 


(27rj  2    |E|  a  2^  m  +  n 


(y.  _  e2  +  -^-^eau-*^)  -  u-^(^)03xi) 
m  +  n 

1  _m_  r?7 

-    o         "  02  +  — 7- ^i©3U-*(0i)  -  n^(01)030i)tE-1 
/  m  +  n 

(z,  -  02  +  -^-^03^(00  -  u-^eoes^)}. 

m  +  n 


65 


The  Fisher  information  matrix  7(0)  is 


0  (m  +  n)£-1  0  0 
0  0  -^-E_1  0 

m+n 

a  fl  H  9S  m+n 

U  U  U  2 


From  Tibshirani  (1989),  the  complete  class  of  the  first  order  probability  matching 
priors  is  given  by: 

Tr(e)  a  glg^li«,(e2, 03,  e)  (4.2.4) 

where  g(-)  is  an  arbitrary  nonnegative  function  with  continuous  third  partial  deriva- 
tives. Since  probability  matching  priors  are  invariant  under  any  one-to-one  re-parameterization 
(see  Datta  and  Ghosh,  1996), 

tt(C,  a,  P,  E)  oc  (pV^u^iOdia  +  -^-^  ti*(C)&  S) 

m  +  n  (4.2.5) 

is  the  complete  class  of  the  first  order  probability  matching  priors  under  the  original 
parameterization,  where  d  is  an  arbitrary  nonnegative  smooth  function.  A  subclass 
of  the  first  order  probability  matching  priors  is  given  by 

7r(C,a,/?,E)  cx  (^flh^iQdiJiQP,*)  (4-2.6) 

Remark  4.2.1  When  p  —  1  ,  (4.2.5)  becomes: 

7r(C,  a,  p,  a2)  cx  l-^d(a  +  — ^— u*  (0/3,  a2)  (4.2.7) 

It  can  be  verified  that  the  priors  given  in  (4.2.7)  are  solutions  of  the  probability  match- 
ing equations  given  in  (5)  in  Ghosh,  Carlin  and  Srivastava  (1995).  These  equations 
are  based  on  the  results  from  Datta  and  Ghosh  (1996).  Also,  the  class  of  first  order 
probability  matching  priors  in  (4.2.6)  becomes 


66 


which  is  exactly  the  same  as  the  general  class  of  solutions  considered  by  Ghosh,  Carlin 
and  Srivastava  (1995). 

4.2.2    Second  order  probability  matching  priors: 

Since  there  are  infinite  many  first  order  probability  matching  priors,  it  is  difficult 
to  make  a  choice  within  this  class.  The  requirement  of  second  probability  matching 
would  dramatically  narrow  down  the  choice  from  the  available  members. 

For  general  unknown  covariance  matrix  E,  we  have  found  that  a  second  order 
probability  matching  prior  may  not  exist.  However,  the  second  order  probability 
matching  priors  do  exist  when  E  is  known  or  E  =  o2V,  V  is  known,  a2  is  unknown. 
We  conclude  our  findings  in  the  Theorem  below. 

Theorem  4.2.1  Consider  the  model  given  in  (4.2.1)  and  (4.2.2).  Then  if  (i)  p  =  1,2 
or  if  (ii)  p  >  2  and  the  parameter  space  of  O3  is  away  from  zero, 

1.  ) 

^(gi,e2,e,)oc(e»s"!ff  Vj(eo 

u(ei) 

is  a  second  order  probability  matching  prior  for  known  E;. 

2.  )  if  E  =  a2V,  with  V  known,  a2  unknown, 

, e2,  e3, a2)  oc  (e'V%rVg(e2) 

is  a  second  order  probability  matching  prior.   Where  g(-)  is  an  arbitrary  smooth 
nonnegative  function  and  u(6i)  is  defined  in  (4.2.3). 

The  proof  of  this  theorem  is  deferred  to  the  appendix. 

Remark  4.2.2  Under  the  original  parameterization,  the  second  order  probability  match- 
ing prior  transforms  to  : 


67 


1.) 


when  E  is  known; 


2.) 


9(a). 


a 


2 


when  E  =  a2V,  V  is  known,  a  is  unknown. 

Remark  4.2.3  when  p  >  2,  if  the  parameter  space  of  any  03i  (/%)  (i  =  1,2, ...  ,p) 
includes  zero,  then  the  corresponding  posterior  distributions  are  improper,  hence 
the  priors  given  in  Theorem  4.2.1  are  not  second  order  probability  matching  prior. 
It  may  be  interesting  to  notice  that  this  problem  is  closed  related  to  the  findings  in 
Theorem  (4.2.1)  (ii)  of  Brown's  (1982),  that  is  even  when  the  hypothesis  Ho:  X\fi  =  0 
for  any  given  Xo  is  rejected,  the  likelihood  ratio  statistic  based  frequentist  confidence 
region  may  be  empty. 

Remark  4.2.4  Stein  (1956)  showed  that  when  the  dimension  p  >  3,  the  usual  max- 
imum likelihood  estimate  (X)  of  the  population  mean  fj,  of  the  multivariate  normal 
distribution  is  inadmissible  under  sum  of  mean  square  error  loss  function.  This  phe- 
nomenon may  have  same  implications  as  the  results  given  in  Theorem  4.2.1.  This  is 
because  that  the  derivation  of  probability  matching  priors  are  based  on  asymptotic 
expansions  of  the  posterior  distributions  around  the  maximum  likelihood  estimators, 
thus,  if  the  maximum  likelihood  estimators  do  not  provide  good  estimates  of  the  true 
parameters,  we  may  anticipate  that  a  good  matching  may  also  not  exist. 

From  above  example  and  also  from  many  other  examples,  we  find  that  usually  if  a 
second  order  probability  matching  prior  does  not  exist,  difficulties  are  also  anticipated 
in  building  a  confidence  region  from  the  standard  frequentist  approach. 


68 


Remark  4.2.5  In  the  univariate  calibration  (p=l),  when  a2  is  known, 

tt(9)  cx  J§U(02)  (i.e.  tt(C,  a,  0)  oc  \0\g(a)) 

is  a  second  order  probability  matching  prior.  When  a2  is  unknown, 

*(6)  oc  JM-y 5(02)  (i.e.  tt(C,  a,  0,  a2)  a 

is  a  second  order  probability  matching  prior. 

Remark  4.2.6  The  prior  used  by  Hunter  and  Lamboy  (1981),  7r  oc  \/3\a~2  ,  is  a  second 
order  probability  matching  prior. 

Next,  we  derive  the  posterior  distribution  of  C  based  on  the  second  order  proba- 
bility matching  prior  in  univariate  calibration.  To  this  end,  first  we  introduce  some 
notations. 


I   n  ^  to 

y  =  z  =  —  ^Zi; 


syy 


-xy 


=  ^LiVi  -  y)2;      szz  =  Y,(zi  -  ^)2; 

t=l  i=l 
n  n 


t=l  t=l 


27TU  2 

C(C)  =  sS  +  ^  +  ^(^-^)2-^1(0&2(0; 


c(C)  =  |6(C)«-«(C)c-»(C)|. 


69 


Theorem  4.2.2  )  In  univariate  calibration  (p  =  1)  ,  the  marginal  posterior  distribution 
of  (  based  on  the  second  order  probability  matching  prior  -k  oc  ^  is  given  by 


p(C|y»«) 


do 


m+n v 

«(0 


+  2e(0c-*(C)jf  — 


(1  +X2)  » 

which  has  finite  posterior  mean  and  infinite  variance.  Where  f  =  n  +  m  —  3. 

Proof  of  Theorem  4.2.1  This  result  is  a  particular  case  of  Theorem  (3.5.2)  in  Chapter 
3.  Let 

/  ,     ,  ,     ,  ,  \ 


1  1 
0  0 


1  1 

xn  0 
0  1 


Then 


X%X  = 


^  n  +  m    0    m ^ 


0  cxx  0 
m       0  m 


Let 


I 

0 

_1 

\ 

n 

n 

0 

1 

0 

_  I 

0 

m+n 

J 

n 

mn 

Q  = 


( 


\ 


'  o  o  1  ^ 

0  1  0 

1  0  0 


Then 


70 


Hence 


f)  =  Q(XtX)-'XtY  =  (z-y,^,y); 


E  = 


Syy  +  Sz 


c2 
cxx 


m+n 
mn 


0  -A 


CXX 


1 

n 


1  \ 


0  1 


1 

n  I 


ivum)  =  {z-y,—)\ 


En  = 


v 


(  m+n      n  \ 


0  r- 


Thus 


=  [J^{z_v)2  +  ck]/syy  +  s» 

m  +  n  cxx  v 


**»» 

) 


|Sn|»  =( 


mn  cxa; 


) 


The  result  now  follows  from  Theorem  (3.5.2)  of  Chapter  3,  after  some  straightforward 
algebra. 


Remark  4.2.7  Tne  posterior  distribution  is  unimodal  and  is  symmetric  at  0  if  and 
only  if  y  =  z .  When  y  >  z,  the  posterior  mode  (larger  mode)  is  positive  or  negative 


71 


depending  on  whether  Cjy  >  0  0  >  0)  or  cxy  <  0  0  <  0).  Similarly,  when  y  <  z, 
the  posterior  model  is  positive  or  negative  depending  on  whether  Cxy  >  0  0  >  0)  or 

cxy  <  0  0  <  0)-  hi  both  case,  as  \cxy\  moves  away  from  zero,  one  mode  becomes 
significantly  dominating,  and  the  other  one  lie  far  away  at  the  tail  and  is  negligible. 
In  the  case  cxy  —  0,  (equivalently,  the  least-square  estimator  of  $  =  0),  The  posterior 
distribution  is  symmetric  bimodal. 

4.3    Multivariate  Linear  Calibration  -  (  is  a  vector 

In  general  multivariate  linear  calibration,  the  calibration  model  is  given  by 

Yi  =  ce  +  BtXi  +  ei  i  =  l,...,n  (4.3.1) 

and  the  prediction  model  is 

Zi  =  a  +  B\  +  e'i  j  =  1, . . .  ,  m  (4.3.2) 

where  Yj(p  x  1),  Zj(p  x  1)  are  real  observed  values,  Xi(q  x  1)  are  fixed  explanatory 
variable,  a(p  x  1),  B(q  x  p),  £(q  x  1)  are  unknown  parameters,  e*(p  x  1)  and  ^{p  x  1) 
are  mutually  independent  and  distributed  as  iV(0,  E).  Without  loss  of  generality,  we 
assume  £"=1  Xtj  =  0,  for  j  =  1, . . .  ,  g,  where  X\  =  (X^, . . .  ,  Xiq).  The  parameter 
of  interest  is  the  q  dimensional  unknown  explanatory  vector  The  log-likelihood 
function  logL(C,  a,  B,  E)  under  the  Model  (4.3.1)  and  (4.3.2)  is  proportional  to  : 

1  1  i=l 

1  m 

-   ^Y,(zi  -  <*  -  BXfViZj  -  a  -  B*0 


72 


The  Fisher  information  matrix  is: 


/(C,o,B,E) 
( 


mBS-'B4       rnBE"1  mCT  0  -BE-1 

mY.-xBt  (m  +  njE"1  mCT  ®  E"1 

mC^E-1^'  mC^E"1  *i*,r  +  mCCT)  ®  E"1 

0                 0  0 


0 
0 
0 

m+n 
d£    "     2  / 


and  the  inverse  of  /(£,  a,  (3,  E)  is 


with 


7-1(C,a,B,E)  = 


i4n  412  i4i3  0  N 

A2i  A22  A23  0 

An  A32  A33  0 

0      0  0  A44  j 


An    =  u(QQ 
An 
An 
A22 


-1 


A23 
A33 
A44 


Al  =  -l-Q~lB 
n 


n       n  ' 


AI2  =  -^-\CKTC^  0  (E  -  BtQ~lB) 


n 


C"1  0  E  -  ti-^OCtf  C^Ctf  0  (E  -  B*Q~1B) 

dE  2 
9E_1  m  +  n 


73 


where 


Cxx  =  ^XiXl  (4.3.3) 


Q  =  BE~lBt  (4.3.4) 
u(Q  =  ±  +  l  +  ?C-*(  (4.3.5) 

Now  the  parameters  of  interest  being  more  than  one  and  the  parametric  orthogonality 
does  not  hold,  the  results  given  by  Datta  (1996)  can  be  used  to  derive  first  order 
probability  matching  priors.  Since  the  assumption  of  independence  between  £  and 
(a,  B,  E)  is  reasonable,  our  derivation  of  probability  matching  priors  is  within  the 
subclass  where:  7r(C,  oc,  B,  E)  =  7ri(C)7r2(o;,  B,  E) 

Lemma  4.3.1  Under  the  model  (4.3.1)  and  (4.3.2), 

7r(C,  a,  B,  E)  cx  u^CCMa,  B,  E) 
is  a  first  order  probability  matching  prior  if  7r2(-)  is  smooth  and  satisfies: 


Btd\ogn2(a,B,Y,) 


dBj 


=  < 


0  i^j 

i J  =  1,...  ,q 

d  i=j 


where  B{  is  the  ith  column  of  B',  d  be  any  real  value. 

The  proof  of  this  lemma  is  technical  and  tedious.  It  is  given  in  the  appendix. 
Lemma  4.3.2  For  any  positive  definite  matrix  M(free  of  B) 


B. 


^logl^MBI 


*  dBj 


0  «/j      .  .  , 

l,j  =  1,...  ,q 

2  i=j 


The  proof  of  Lemma  4.3.2  is  also  given  in  the  Appendix. 


74 


Now,  we  give  our  major  theorem  of  this  Section.  This  theorem  provides  a  class  of 
first  order  probability  matching  priors  for  the  general  multivariate  linear  calibration 
problem. 

Theorem  4.3.1  Under  the  models  (4.3.1)  and  (4.3.2), 

tt(C,  a,  B,  E)  oc  MC)}*^  II  |fl*Afi(E)B|**(£)  (4.3.6) 

i=l 

is  a  class  of  the  first  order  probability  matching  priors  of  parameter  £,  where  /  is 
any  nonnegative  integer  ,  M*(E)  are  p.d.  px  p  matrix,  s  is  any  nonnegative  smooth 
function,  d{  is  any  real  value,  d  =  £j=1  d{. 

Proof  of  Theorem  4.3.1  This  is  an  immediate  consequence  from  Lemma  4.3.1  and 
4.3.2. 

Remark  4.3.1  When  q  =  1,  the  probability  matching  priors  in  Theorem  (4.3.1)  belong 
to  the  class  of  the  first  order  probability  matching  priors  given  in  (4.2.5). 

Jeffreys'  prior  can  be  obtained  by  calculating  the  square  root  of  the  determinant 
of  Fisher's  information  matrix  /(£,  a,  B,  E),  which  is,  after  some  algebra,  given  by: 

r'CCa^E)  (xu^iOlBH-'B^m-^ 

Following  Berger  and  Bernardo  (1992a),  using  rectangular  compacts  for  C,  B,  E, 
and  using  the  formular  of  a,  B,  E),  the  reference  priors  can  be  easily  de- 

rived.Table  1  provides  a  complete  catalog  of  reference  priors  for  the  multivariate 
linear  calibration  problem  where  £  is  always  the  parameters  of  interest,  and  the  re- 
maining parameters  are  split  into  one,  two,  or  three  groups  according  to  their  order 
of  importance.  The  calculations  are  similar  to  these  of  Kubokawa  and  Robert  (1994), 
Plessis  et  al  P00^  in  unidimensional  multivariate  calibra+:on  case  (for  two-group 
reference  priors,.  0-;e  may  notice  that  Plessis,  Merwe  and  Groenewald's  results  for 


75 


Jeffreys'  prior  and  reference  priors  may  involve  some  type  errors.  For  example,  a  2 
in  their  (2.1)  should  be  a"(p+1),  and  |E|^+1)  in  their  (5.3)  should  be 
etc.Comparing  Table  1  with  the  results  in  Theorem  1,  we  can  see  that  a  reference 
prior  (including  Jeffreys'  prior)  belongs  to  the  class  of  the  first  order  probability 
matching  priors  if  and  only  if  p  =  q  =  1,  which  is  the  univariate  linear  calibration 
case. 

For  reference  priors,  the  explicit  form  of  marginal  posterior  distribution  can  be  ob- 
tained. Let  a,  B  be  the  least-square  estimators  of  a  and  B  based  on  the  observations 
from  the  calibration  experiment  (4.3.1)  only,  and  let 


S+  =  J2(yi  XiB)(Yi  -&-  XiB)*  +  £(z,-  -  Z)(Zj  -  Zf 

i-i  j=i 


where  Z=^T=iZj. 

Theorem  4.3.2  With  the  prior  7r(C,cv,5,E)  oc  lEr^^trift)  ,  where  d  >  0,  the 
marginal  posterior  distribution  of  £  is: 


i     /  --,  ,  m+n-2<;-p-l+d 

p(C|^)oc-  >(C)I  2 


MC)  +  (Z-&-  BPQTSi\Z  -a-  BTQ}m+nTlfd 

which  are  integrable  and  have  up  to  (q  +  p  -  2)/2  th  finite  moments. 

Proof  of  Theorem  4.3.2  The  derivation  of  posterior  distribution  follows  similarly  as 
Brown  (1982)  and  Press  (1982),  P186-188.  The  proof  of  integrable  and  finite  moment 
are  also  straight  forward  and  is  omitted  here. 


76 


Table  4.1.  Catalog  of  reference  priors 


Grouped  parameters  in  their  order  of  importance 

Prior  distribution 

{C, 

oc  J — p+^+2  u  2  (£)      (Jeffreys'  prior) 

OC  |S|          2^2  (Q 

{C},  {<*,£},{£} 
{C}, 

OC  |£|     '  M   2  (0 

oc  |E|    2  u  2  (Q 

{C},  {£} 

{C},{B},{a,E} 
{C}, 

{C},{a},{B,E} 

OC  |E|       2    u  2(Q 
OC  \T,\        2    M     2  (Q 

oc  |E|     2   u  2{Q 

OC  |E|          2      M  2(£) 

{(},{<*},  {£},{£} 

{C},M,{E},{s} 

{C},{5},{a},{E} 
{C},{5},{S},{«} 
{C},{E},  {<*},{£} 
{<},{£},{B},{a} 

oc  L     2^2 (r) 

II                       \  3  / 
OC  |E|        2^2  (Q 

OC          — £2iW_2  (C) 

OC  |L|     2  m   2  (Q 

oc  |EI    2  u  2  (Q 

3C     E|        2    M     2  (£) 

4.4  Appendix: 

Proof  of  Theorem  1 : 
Case  1.  £  is  known. 

Under  the  orthogonal  parameterization,  a  prior  in  (4.2.4)  is  second  order  matching 
prior  if  and  only  if  it  satisfies  the  partial  differential  equation  (2.10)  in  Mukerjee  and 
Ghosh  (1996),  that  is: 

1  k  k 

-t/(^)A(/i?^,i,i)  +  ££  Dv{i;hnJsvd{9^)}  =  0  (4.4.1) 

D  v=2s=1 

In  our  context,  k  =  2p,  8^  =  (62,  63),  L1>lfl  =  E{^f  =  0 


77 


•^112 


^  Lu(p+\)  j 


a3  logi 


■^ll(p+2) 


^  Al(2p+1)  y 


^96,  "     /x2^)  3 


/-1(^i,e2,e3)  = 


0 

0 

1 

m+1 

0 

0 

m+nr 
ran  / 


Let  Isv  denotes  the  (s,v)th  element  of  I  1(^i,02,63).  (4.4.1)  simplifies  to 


P      ft      v  v 


e=l  9^3e  i=l  j=l 


u(gl)  =  0, 

(e3E-ie3)*  v  " 

where  0"ij,0"lJ  are  respectively  the  (i,  j)th  element  of  E  and  E-1. 

I   j  =  e 

Since  2i=i  aijaei  =  S  •  (4.4.2)  further  simplifies  to 

0  j^e 

e=1  w3e      (9|E  193)2 

One  class  of  solutions  is  :  d{6^)  oc  (©^E-^-V^©,,) 

Then  a  class  of  first  order  probability  matching  priors  is  given  by 


(4.4.2) 


7^,0*  e3,  a)   cx  (9|S"lffVff(02) 

u(6'i) 


78 


Case  2.  The  proof  is  similar  to  the  previous  case. 


Proof  of  Lemma  1: 

Let 

6*     =  (<*,  a\  {vec{^))\  En, ...  ,  Elp,  E22, . . .  ,  E2p, . . .  ,  Ew) 

4    =  u(OQu 

*<*)  =  « 

where  Qn  is  the  z'th  diagonal  element  of  Q~l,  Q  is  defined  in  (3.11)  and  P(6)  is  the 


t 


ith  column  of  I-l(6),i  =  1, . . .  ,q,  B*  =  {Bu...  ,Bq),  vec(£<)  =  (£<,£',...  ,B<) 
Following  Datta  (1996),  we  only  need  to  solve  the  differential  equations 

£^M%(0)]  =  O  for  i  =  !,...,(?,  (4.4.3) 

This  is  equivalent  to  : 


L, — ^  ^)  +  L(^W^)^r  =  o    fon  =  i,. 


dOj        -    ^  -  (4A4) 


After  some  algebra,  (4.4.4)  can  be  written  as 


,  ^iiw  d7r(6)     Al2{i)  d7r(6)     An®  dn(9) 


where  /tn(i)  denotes  the  ith  row  of  the  matrix  An-  With  the  independence  assumption 
7r(C,o,fl  E)  =7r1(C)7r2(a,fi,E)  ,  (4.4.5)  becomes 


79 


This  simplifies  to 


*n(.)     d(      -    dutv[       dc       j    d^rl  dvec{Bt)  \ 

g/op7r2(g,g,E)  dlog^ja^B^) 

 Ya  M^     dvec(B<)  (4A6) 


We  may  see  that 


"   1  dr>ec(£') 


gxl 


=  -(p-i)g-1c-1c, 


and 


13     dvec(B*)  ^    WU^("  ^AJ> 

where  W  =  (ff*Xi?B'S))w-  Then  (4.4.6)  becomes: 

^  n  (4.4.8) 

Let  7t2(q!,  B,  E)  be  free  of  a,  then  (4.4.10)  becomes 

,(C)^f^  =  ^+(P-2)/,)C-'C  (4.4.9) 


80 


Lemma  1  follows. 
Proof  of  Lemma  2 

Let  Uj  (p  x  1),  j  —  1, . . .  ,p,  be  the  unit  vector  with  jth  element  equal  to  1.  Using 
result: 

dlQ9\M\  _  tr  ( Arl  (dMv 


da 


da 


qxqj 


where  M  =  (My)  be  any  qx  q  non-singular  matrix.  Then 


( 


aiog|BMB£| 


=  tr 


(BMB) 


t\-\ 


0 
0 
0 


u)MBx     ...  0 


0 

2u)MBi 
0 


u)MBq 


0 

u)MBq 
0 


Hence 


B 


t  a  log  | BMB* | 
dBi 


tr 


(BMB*) 


t\-\ 


\ 


0 

B\MB\ 
0 


BtMBi     ...  0 


0 

2B{MBi 
0 


...  B\MBq 


0 

B\MBq 
0 


2  k  =  i 
0  fc^i 


for  A;,  %  —  1, . . .  ,  g.  Lemma  2  follows. 


CHAPTER  5 

ASYMPTOTIC  EXPANSIONS  FOR  POSTERIOR  PROBABILITY  IN  REGRESSION  MODEL 

5.1  Introduction 

Much  of  the  literature  on  matching  the  frequentist  and  Bayesian  coverage  prob- 
abilities is  based  on  the  assumption  that  observations  are  from  an  independent  and 
identically  distributed  member  of  a  distribution  family  (i.i.d).  Lee(1989)  discussed 
priors  which  achieve  this  matching  for  one  sided  and  symmetric  confidence  intervals 
in  non  i.i.d  case.  His  derivations  are  heavily  based  on  Durbin's  work  (1980),  which 
assumes  that  the  maximum  likelihood  estimators  (MLE)  of  the  parameters  are  suf- 
ficient statistics.  Such  a  requirement  usually  is  not  held  outside  of  the  exponential 
family. 

In  general,  the  derivations  of  matching  prior  need  assumptions  on  valid  asymp- 
totic expansions  of  posterior  distribution  and  assumptions  on  frequentist  Edgeworth 
expansion  (cf.  J.K.Ghosh,  1994).  Johnson (1970)  investigated  the  asymptotic  expan- 
sions of  the  posterior  distribution  for  a  one  parameter  family  of  distribution  in  the 
independent  and  identically  distributed  (i.i.d)  set  up.  His  proof  involves,  among  oth- 
ers, repeated  use  of  a  version  of  the  uniform  strong  law,  which  is  difficult  to  verify  in 
non  i.i.d  case.  We  extend  Johnson's  results  to  location-scale  linear  regression  model. 
With  some  simple  assumptions,  we  establish  certain  properties  of  uniform  strong  con- 
sistency. Our  derivations  follow  very  closely  to,  but  are  necessarily  modified,  from 
Johnson's. 

In  section  2,  we  give  the  location-scale  linear  regression  model  formula,  some 
notation  and  basic  assumptions.   Since  there  is  no  rigorous  result  available  in  the 


81 


82 


literature  for  strong  consistency  of  MLE  estimator  under  the  general  linear  regression 
model,  and  this  result  is  crucial  for  valid  asymptotic  expansion  of  posterior  probability, 
we  give  a  proof  of  the  strong  consistency  of  maximum  likelihood  estimator  (MLE) 
under  this  model  and  with  some  assumptions  in  Section  3.  In  Section  4  we  establish 
valid  asymptotic  expansions  of  posterior  probability  of  any  p  +  2  dimensional  Borel 
set.  We  show  that  with  probability  one,  the  centered  and  scales  posterior  distribution 
possesses  an  asymptotic  expansion  in  powers  of  n~2  having  the  standard  multivariate 
normal  as  a  leading  term.  The  number  of  terms  in  the  expansion  obtained  is  two  less 
then  the  number  of  partial  derivatives  of  the  loglikelihood.  All  terms  beyond  the  first 
consist  of  a  multivariate  polynomial  multiplied  by  the  multivariate  normal  density. 
Finally,  in  Section  5,  some  further  discussions  are  offered. 

5.2    Model,  Notation  and  Assumptions 

Model:  Let  yi,  y2, . . .  be  a  sequence  of  scalar  observations  from  location-scale 
family 

Yi  =  dtXt  +  ati  (5.2.1) 

where  (j  are  independent  and  identically  distributed  random  variables  with  density 
function  /(e)  ,  /?'  =  (fi0, . . .  ,  /?p)  is  a  vector  of  the  intercept  and  slope  parameters, 
Xi  =  (l,xn,...  ,xip)  are  covariates. 

Let  Ln(0,  Y,  X)  =  nr=i  f{Yi~aXi)c  be  the  likelihood  function  and  §n  be  the  max- 
imum likelihood  estimator  of  the  true  parameter  value  90  =  (/?o,a0)  based  on  the 
observations  Y  =  (yu . . .  ,yn)  and  the  covariates  X  =  (Xx, . . .  ,  Xn).  We  shall  usu- 
ally drop  subscript  n,  Y  and  X  in  the  likelihood  function  and  drop  subscript  n  in 

On- 


83 


Notation:  Our  main  results  and  derivations  involve  the  use  of  intensive  notation. 
To  simplify  the  presentation,  some  notation  are  defined  here.  More  notations  will  be 
given  later  in  each  section  whenever  needed. 

Let  Jp+1  (6)  and  ej?  jp+1  (6)  be  respectively  the  mth  partial  derivative  of  the 
loglikelihood  function  and  its  expectation. 


Al  *hW  = 


d"l\ogL{6) 


JO,...,Jp+1v  dpi0,...  ,d/3Jp"da^ 

1     n     p  jp+i 

=  (-ir^BIK*)  E(/m-fc(^w)(^w)ip+i-fe4Jp+1); 

°     i=l  e=0  k=0 

4L*»«  -  ^  ^(^Hr^Dk). 

u        i=l  e=0 

where  cip+1>m  =  /  Ei=o  dkjp+lfm-k{x)xj"+'-kf{x)  dx,  m  =  j0  +  . . .  +  jp+u 

dkjp+i  are  nonnegative  integers  which  depend  on  k  and  jp+\.  Let  B  =  Bn  and  be 

respectively  the  sample  Fisher  information  matrix  and  the  Fisher  information  matrix 


a2 


where  a0  =  -Ef2{Z),  ax  =  -Ef2{Z)Z,  o2  =  1  -  Ef2(Z)Z2,  Z  ~  /(*).  Further  let 
/jt(-)  be  the  fcth  derivative  of  logf(x),  \$  be  the  jth  largest  eigenvalue  of  the  positive 
definite  matrix  M,  Z4(0)  =  and  h  =  hn  =  n*{0  -  0). 

We  study  the  behavior  of  the  posterior  distribution  of  the  centered  and  scaled 
variable  u>  where 

u  =  B»h. 


Basic  assumptions: 


84 


Assumption  1.  f(x)  has  (r+3)th  continuous  derivatives.  Whenever  (a-l)2+62  < 
52  <  1,  S  is  small  enough, 

+  6)  -  < 

with 

/oo 
\fi(x)\  •  \x\%  -f(x)dx  <  co 
-oo 

/oo 
i7t0*0  •  |^|*  •  f(x)dx  <  oo 
■00 

fori  =  0,1, 2,...  ,r  +  3. 

Assumption  2.  when  (a,b)  ^  (1,0),  /(ax  +  b)a  ^  f(x)  for  at  least  one  x. 
Assumption  3.  For  arbitrary  (a,  6)  €  /?2,  /f^  |/i(ax  +  &)|/(x)dx  <  oo. 
Assumption  4.  /2(x)  <  0. 

Assumption  5.  Let  s((,p)  be  a  sphere  centered  at  C  £  #2  with  radius  p  >  0. 
Then  for  sufficient  small  p,  sup^^,,^  /(ox  +  6)  is  a  measurable  function  of  x,  and 
/■^oPogsup(atfc)ea(CiP)  f{ax  +  b)]+f(x)dx  <  oo 

Assumption  6.  jjXB||  =  0(1),  JE?=1^  ->  someX,  and  ^E?=i(*i  ~  - 
^4pxp,  where  Apxp  is  positive  definite. 

Assumption  7.  0  C  0  open,  0  compact  C  BP  x  R+  .  tt(6)  has  (r  +  l)th  continuous 
derivatives.  ir(0)  >  0,  whenever  6  C  ©i  open  C  0  ,  and  n(9)  —  0  for  0  e  0^. 

We  now  make  some  remarks  on  the  assumptions.  Assumptions  2-6  are  needed 
in  the  proof  of  strong  consistency  of  the  maximum  likelihood  estimator  of  the  true 
parameter  60  (Theorem  (5.3.1)),  while  Assumptions  2,  3  and  5  are  similar  to  the 
classical  assumptions  of  Wald  (1949),  Assumption  6  is  the  usually  assumption  of 
design  matrix  for  consistency  of  the  least  square  estimator  of  90  under  normal  linear 
model,  and  Assumption  4  is  there  for  technical  reasons.  Assumption  1  is  essentially 
the  only  key  assumption  for  the  asymptotic  expansion  of  the  posterior  probability 


85 


(Theorem  (5.4.1)).  It  is  used  in  Lemma  4  to  substitute  a  version  of  uniform  strong 
law  employed  repeatedly  by  Johnson  (1970)  in  the  i.i.d.  case. 

5.3    Strong  Consistency 

In  this  section  we  prove  the  strong  consistency  of  MLE  estimator.  This  property 
is  crucial  for  the  valid  asymptotic  expansions  of  posterior  probabilities.  The  proof  is 
an  analogue  of  Wald's  (1949)  techniques,  but  necessarily  more  complicate  due  to  the 
involvement  of  the  covariates  X. 

Theorem  5.3.1  Let  6n  be  a  maximum  likelihood  estimator  of  the  true  parameter  90, 
then  under  Assumption  2-6 


Remark  5.3.1  The  proof  of  Theorem  (5.3.1)  will  make  it  clear  that  a  maximum  likeli- 
hood estimator  exists. The  uniqueness  of  this  estimator  is  not  settled  by  this  theorem. 
But  for  the  asymptotic  expansion  of  posterior  probabilities,  only  the  existence  not 
the  uniqueness  is  needed. 

Remark  5.3.2  Consistency  of  maximum  likelihood  estimator  for  independent  but  non 
i.i.d.  case  are  considered  by  many  authors.  Among  them,  Bradley  k  Gart  (1962), 
Hoadley  (1971)  discussed  the  weak  consistency  of  the  maximum  likelihood  estima- 
tors, Chao  (1970)  discussed  strong  consistency  of  the  maximum  likelihood  estimators. 
Chan's  assumptions  require,  in  our  context,  that 


uniformly  in  i  a.s.  [p]  as  \\0\\  ->  oo.  One  may  recall  that  6  =  (f?,a).  This  condition, 
in  general,  is  not  satisfied  by  our  model  and  assumptions. 


p{  lim  6n  =  0O}  =  1 


0 


86 

Before  we  can  prove  above  theorem,  we  need  some  new  notations  and  three  lem- 
mas. Let  C  =  (a, b0,bi)  G  R3,  S(C,p)  be  a  sphere  centered  at  C  with  radius  p(>  0). 
Define 

U(Z,S((,p))  =  sup{a,Mo^)eS(Cp)f,(aZ  +  b'0)b[ 
L(Z,  S(C,  p))  =  inf{a'tb'ji)eS{C!())fi{a'Z  +  b'0)b\ 
W(Z,S{(,p))  =  log[sup{altblji)eS{(p)f(a'Z  +  b'0)a'} 

where  Z  ~  f(z)  and  /i(-)  is  defined  in  section  2.   Let  u(S(£,p)),  p))  and 

w{S{C,  p))  be  respectively  the  expectations  of  U(Z,  S(£,  p)),  L(Z,  S(C,  p))  and  W(Z,  S((,  p)). 

Lemma  5.3.1  :       Under  the  notations  above 

limp-+0u(S((,p))  =  limp-+0l(S(C,p))  =  Efx(aZ  +  b0)bi 

Proof  of  Lemma  5.3.1  Notice  that  f\(x)  is  a  nonincreasing  function  of  x  (by  As- 
sumption 4).  Let  di  =  |/i((a0  -  p)Z  +  b0  -  p)\,  d2  =  \fi{(a0  +  p)Z  +  bQ  +  p)|, 
ds  =  |/i((a0  +  p)Z  +  b0-  p)|,  d4  =  |/i((a0  -  p)Z  +  b0  +  p)|,  Then  U(Z,S(C,p))  < 
(d1  +  d2)(\bl\  +  p)  whenever  Z  >  0  ,  U{Z,S(C,p))  <  {d3  +  <24)(N  +  p)  whenever 
Z  <  0  and  L(Z,5(C,p))  >  -(di  +  cfcXM  +  p)  whenever  Z  >  0,  L(Z,S{C,p))  > 
—  (d3  +  rf4)(|61|  -I-  p)  whenever  Z  <  0.  Hence  Lemma  5.3.1  follows  from  Assumption  3 
and  the  uses  of  monotone  converge  theorem. 

Lemma  5.3.2  :      Under  the  notations  above 

limp^0w{S((,  p))  =  Elog[f(a0Z  +  b0)a0] 

Proof  of  Lemma  5.3.2  This  follows  directly  from  Assumption  5  and  the  use  of  mono- 
tone converge  theorem. 

Since  we  need  repeated  use  of  the  following  result  in  this  section  and  in  the  next 
section,  we  write  it  down  as  a  lemma.  The  proof  is  straightforward,  and  is  omitted. 


87 


Lemma  5.3.3       Let  {X,Xn,n  >  1}  be  i.i.d,  E\X\  <  oo,  then 
for  an  =  0(1). 

Proof  of  Theorem  5.3.1  We  first  prove  this  theorem  for  p  —  1,  which  is  the  simple 
linear  regression  case.  The  proof  for  general  p  are  similar. 
It's  equivalent  to  show  that  for  any  e  >  0 

p{Hm  ||0-flo||<e}  =  l  (5.3.1) 

If  (5.3.1)  is  not  true,  then  there  exist  infinite  number  of  n  such  that  ||  6  —  60  \\>  e. 
This  implies 

sup      L(0)/L(d0)  >  1  (5.3.2) 
\\0-eo\\>t,oee 

for  infinite  n.  Let 

a  =  a0/a,            &=(#>-  0)/o  =  (&o,  (5.3.3) 

Then  there  exist  M0(>  0),  e0(>  0)  such  that  {0;  \\  6 - 0O  ||>  c  and  0  G  6}  C  D,  where 
D  =  {(a,  60, 6i)  6  fl3;  e0  <  (a  -  l)2  +  6g  +  6?  <  M0}.  Now, 

n 

/o#7"  ((9)  -  logKOo)   =   Y\lo9f(aZi  +  b0  +  b,xu)a  -  logf(Z;)} 

n 

=   J2ilo9f(aZi  +  bo  +  biXi)a-logf(Zi)] 

i=l 

+  £ /, (aZi  +  60  +  Mi)&i (*„  -  +  ± £  /2(C)6?(^i,  -  *i)2 
=  I +  11  + HI  (5.3.4) 


88 


where  X\  =  lim^^  £"=1  xxi,  Q  is  a  intermediate  value.  One  may  recall  that  Z{  = 
ZiVQ)  =  ~  /(*)•  Without  loss  of  generality,  we  assume  Xi  =  0.  For  any  point 

Co  £  (do, bo,b\)  =  D,  we  associate  a  radius  Co  >  0,  such  that 

Elogf(Z)  -  u/(S(Co,  Po))  >  *o  (5.3.5) 

«(5(Co,Po))-/(5(Co,Po))<^  (5.3.6) 

for  some  d0  >  0,  where  M  =  max\xu\.  The  existence  of  such  a  8Q  is  guaranteed  by 
Lemma  1  and  Lemma  5.3.2  and  the  fact  (using  Assumption  2):  Elog[f  {clqZ +bo)a0]  < 
Elogf(Z). 

Since  D  is  a  compact  set,  then  there  exist  a  finite  number  of  points  £?•  =  (aj,  £>oj>  &ij), 
radius  p,  and  5j  >  0,  j  =  1, . . .  ,  m,  such  that  (5.3.5)  and  (5.3.6)  hold  when  (Co,  Po,  <^o) 
is  substituted  by  (Q,  Pj,  6j)  for  j  =  1, . . .  ,  m,  and  also  D  C  U™=1S(Q,  Pj).  Using  SLLN 
and  Lemma  5.3.3,  we  have 

l-t\W{ZuS{^Pj))  -  hgfiZi)]  ±=>  u»(5(0,ft))  "  < 

1  n  5 
limn-.oo-^^Zi.SCCj-.Pi))  -  L(ZuS{Cj,pj))]xuI{xu>o}  <  ~k 

^^^(Cp^Xh^O. 

n  i=l 


89 


Hence,  from  (5.3.4) 

sup\\0-(>o\\>t,eee[logL(6)  -  logL(60)] 

n  n 

<  sup{aMM)€D^{log{f{aZi  +  b0)a)  -  logf(Zi)}  +  £  U{aZ{  +  b0)bixu 

?=i  i=i 

+^E/2(C)ft?4] 

<  max^n  jr{W(Zh  S(Q,  Pj))  -  logfiZi)} 

i=l 
n 

+    ma^Kj^rn  £{*7(Zt,  S{Q,  Pj))  -  L{Zh  S{Q,  Pj))}^iJ{xu>o} 

n 

+    maxx<j<m  J2  L(Zi,  S^,  Pj))xu 

i=l 

a.e. 

— OO. 

Equivalently,  p{sup\\e_g0\\>(f€QL(9)/L(90)  =  0}  =  1,  which  is  contradicted  with 
(5.3.2).  So  the  theorem  is  proved  when  p  =  1.  For  general  p  >  1,  notice  that  II 
in  (5.3.4)  becomes  53"=i  fi{aZi(9o)  +b0)bjXj,  hence  the  theorem  follows  by  con- 
sidering convergence  of  sum2=1fi(aZi(9o)  +  bo)bjXj  for  each  j  =  1,  oc,p, separately. 

5.4    Asymptotic  Expansions  of  Posterior  Probability 

Let  (J>A(h)  be  a  pdf  of  multivariate  normal  with  zero  mean  and  covariance  matrix 
A,  p(u\y)  be  posterior  density  function  of  u  given  y,  where  y  =  (yi,  y2, . . .  ,  yn,  ■  ■  ■ )  is 
the  observed  sequence  ,  ou  is  defined  in  section  2.  Denote  7p+2  as  the  (p  +  2)  •  (p  +  2) 
identity  matrix.  Below  is  our  major  theorem. 

Theorem  5.4.1  )  Let  90  €  0  be  fixed.  In  addition  to  Assumption  1-7,  we  assume 
7r(0o)  >  0.  Then  there  exist  constants  M  and  Ny,  such  that 

I  j£>My)  -  4>/P+»(l  +  Pr)]rfo;  |<  MrT^ 


90 


uniformly  for  all  p  +  2  dimensional  Borel  set  G  and  for  all  n  >  Ny  on  an  almost 
sure  set  S  {Peo(y  e  S)  =  1).   Where  M  depends  on  r,  JVj,  depends  on  r  and  y, 

1  r 

Pr  =  n~20!  + . +  n~25r,  ^  are  polynomial  of  ft  with  order  3z,  all  coefficients  of  the 
polynomial  are  bounded. 

Remark  5.4.1  Johnson  (1971)  derived  asymptotic  expansion  of  posterior  distribution 
in  one  parameter  and  i.i.d  case.  As  pointed  out  by  J.K.Ghosh  (1982),  his  expansion  is 
valid  for  the  posterior  probability  of  any  one  dimensional  Borel  set.  Theorem  (5.4.1) 
extends  these  ideas  to  multiparameter  and  non  i.i.d.  case. 

Let  Si  =  {yjirrin^oof)  =  0O}-  From  Theorem  (5.3.1)  we  know  that  S%  is  a  null 

set. 

We  start  the  proofs  of  Theorem  (5.4.1)  with  several  technical  Lemmas. 

Lemma  5.4.1  For  arbitrary  e  >  0,  there  exist  6\(>  0),  such  that  for  each  y  £  S2 
(where  S|  is  an  nuu"  set  defined  later),  an  N\y,  whenever  n  > 

uniformly  for  ||  9  -  60  ||<  6X,6  €  6,m  =  2,3, ...  ,r  +  3,  where  A%r  d  (6)  and 
e"^ ...  jp+1(0o)  are  defined  in  section  2. 

Proof  of  Lemma  5.4.1 

<   I^Lj^W  -  Al  ,p+1  (B)\  +  \Al_jp+i{6Q)  -  e£...Jp+1(0o)|  =  /  +  // 


91 


Now 

i     n     p  jp+i 
°     t=l  e=0  fc=0 

1         i      n     p  jp+i 

+  l^r  -  ^rl  E(n  4)  E  l/^WII^-*^ 

°0        a      ,=i  e=o  fc=0 

where  a,  6*  and  c4Jp+1,  Z»  =  Zj(0o)  are  defined  respectively  in  (5.3.3)  and  in  section  2. 

Further,  let      =  then  for  small  8  >  0,  there  exists      >  0,  such  that 

(a  -  l)2  + 17?  <  52,  for  i  =  1, 2, . . . ,  whenever  ||  9  -  90  ||<  6[,  9  <E  9.  Thus 

|/,(aZi  +  »?j)(aZi  +  77ir-/(f(Zi)Z?| 

<  |/,(oZi  +  »7i)  -  MZJWaZi  +  rii]'  +  l/^IKaZi  +  ^r  -  2/| 

<  SH^Z^i  S  )\Zi\<  +  62*+2\fq(ZMr  +  Wi\'  +  Yl(  ' 

e=0    e  e=0  e 

for  g  >  s  =  1,2, . . .  ,r  +  3,  where  we  have  applied  i.)  of  Assumption  1  in  the  last 
line  above.  Hence,  from  Lemma  5.3.3  and  Assumption  1,  6,  we  can  see  that  there 
exist  5i(>  0), Ny,  such  that  |7|  <  en/2  uniformly  whenever  ||  9  —  9$  \\<  Si, 9  G  0  and 
n  >  Ny.  On  the  other  hand,  ^77       0  by  Lemma  5.3.3  and  Assumptions  1,  6. 

Finally,  we  define  S2  as  the  set  of  all  observed  sequence  y  such  that  all  related 
limits  above  converge.  The  Lemma  follows. 

Lemma  5.4.2  :  Let  M  be  a  given  q  x  q  positive  definite  matrix,  then  there  exist  e  > 
0,  such  that  for  any  symmetric  matrix  A,  whenever  \Aij  -  Mij\  <  e,i,  j  =  1,2, . . .  ,</, 
we  have 

where  A$  is  the  jth  largest  eigenvalue  of  M. 


92 

Proof  of  Lemma  5.4.2  The  proof  is  straightforward  and  is  omitted. 

We  can  see  that  -^In(9o)  converges  to  a  positive  definite  matrix  under  Assumption 
6,  which  is 


^     d\X         0,2  j 


1 


where  Oo,  a,\  and  a2  are  given  in  Section  2. 

Lemma  5.4.3  :      Under  the  Assumptions  1-6,  there  exists  a  52(<  1),  such  that 

±(\ogL(6  +  rt)  -  \ogL(§))  <  -\\\V\\2^ 
for  \\r]\\  <  52,r)  +  9  e  Q,  whenever  n  >  N2y(>  NXy),  each  y  €  Si  D  52. 

Proof  of  Lemma  5.4.3  For  y  G  Si  n  S2,  choose  S2  >  0  and  N2v,  such  that  |A  ^~2A  .  - 

I  <  3AK'  \\0  +  V-0o\\  <  Si  and  is  small  enough,  whenever  |M|  <  S2 

and  n  >  A^.  Where  Si  appeared  in  Lemma  5.4.1.  The  existness  of  such  a  52  and 
N2y  is  guaranteed  by  Lemma  5.4.2  and  Theorem  (5.3.1).  Then 

l(\og L(6  +  V)-  log L{9))   <   -(log  L(9  +  rj)  -  log  L(90)) 
n  n 

-   !„'(-( A2         WW      )n  I  n'1  dl0^e») 

<  4a»mi2 

where  0*  is  intermediate  vector  between  0O  and  0  +  rj. 


The  next  Lemma  shows  that  the  posterior  mass  outside  of  a  compact  set  may  be 
neglected. 


93 


Lemma  5.4.4  :  Under  the  Assumptions  2-6,  there  exists  an  e  >  0,  a  null  set  Sf 
with  S3  C  5i  fl  S2  and  for  y  £  53,  an  N3y(>  iV^)  such  that 

±(log L(9  +  V)- log  L(9))<-t 

for  ||77||  >  52,  r]  +  9  G  0  whenever  n  >  N3y. 

Proof  of  Lemma  5.4.4       This  follows  directly  from  the  proofs  in  Theorem  (5.3.1). 
The  posterior  density  of  h  given  y  is  proportational  to 

L{6  +  n~2h) 


L{9) 


-n{e  +  n-'h)  (5.4.1) 


This  density  function  can  be  approximated  through  Taylor  expansions  for  both  7t(9  + 

n~*h)  and  logL{9  +  n~*h)  at  9.  Before  establishing  certain  properties  of  the  function 
in  (5.4.1),  we  need  more  notations  for  those  Taylor  expansions.  Let: 

7fP  =  7r((9)  +  En-*/24((?) 


where  dk{9)  =  ±  Sfe+...+Wl=fc(9^gWl  )(Ilg  ft). 


fc=l 

_  L 


where  bk(9)  =  ^  Z3o+...+jp+1=k+2(iA^jp+i  (9)(Wtl  ft*)),  #  =  ih>  •  •  •  ,  V0-  B 
and  A*+2  jp+1  (9)  are  given  in  section  2. 

We  will  treat  M  as  a  generic  constant  in  the  following  texts. 


94 

Lemma  5.4.5  Assume  Assumptions  1-6  hold,  then  there  exists  a  constant  M,  for  each 
y  €  53,  an  A^J/(>  A^y)  such  that 

|  I  [exp{lT+,}nr  -  L^  +  n"2kK(0  +  n~h)]dh\  <  Mn'^ 

1  Vll<«V     1      1  L{6) 

where  52  is  defined  in  Lemma  6. 

Proof  of  Lemma  5.4.5 


A  1 

\exp{lr+l}nr  -  ,  Ln{9  +  n  *h)\ 

.L(e  +  n-2h).  _.u  Tc    .     L{0  +  n-$h). 

=    /  +  // 

For  \\h\\  <  nH2,  since  dr+i(6)  are  bounded  for  6  in  a  neighborhood  of  60  ,by  using 
Lemma  5.4.3  and  Assumption  7,  we  have 


/  <  Me"'Aw  l|/l"2  ||/l||r+1n-r*1. 


Choose  small  e  in  Lemma  5.4.1,  such  that  e  <  -j^A^J^.  Then 


//   <   ML{9  +  l2k)  1  exp{n-^(6r+1(^)  -  br+l(§*))}  -  1| 
<  Mexp{-lA^||/1|Hexp{C^|(^+i(^V6r+i(^))n-£fl 


< 


Mexp{-iA^0))||^l|2}exp{2e^1||/l||2}n-I*1||/l|r+3 


<  Mexpi-iA^II/.inil/iir^n-^ 

where  #*,(*  are  intermediate  values.  We  have  used  inequalities  {nr**1  (br+i(d)  - 
K+i{0*))\  <  2e||/i||r+3n-£*1  <  2e||h||2<^+1.  Then  there  exists  an  N4y(>  N3y),  such 


95 


that 


rf  lA  L(8  +  n  2J1)  -  _i,m 
exp{/r+1}7rr-  — j — ltt{9  +  ti  'h)\ 


1 


<   Mexp{-|A^)||/l||2}(||/l|r+3  +  ||/i|r1)n-" 
Hence  the  lemma  follows. 

Next  we  construct  asymptotic  expansion  for  posterior  distribution  of  h  given  y 
through  repeated  use  of  Taylor  expansions  and  keep  terms  with  order  less  than  rT^ . 

Step  1:  For  \\h\\  <  n* 82,  where  52  appeared  in  Lemma  6, 

r+l  r+1 

h/;r+1|  <  £n -*\h0)\  <  Mn-\\hfY,%-\ 

fc=l  k=\ 

Hence 

|e-^tB*(l  +  </v+1  +  . . .  +  -t^+1)  -  exp{fr+1}|  <  M^O^V^^3^ 


r  +  l 


r! 


for  each  y  G  54(c  S3),  S%  is  a  null  set,  an  N5y(>  NAy)  and  n  >  N5y. 
Step  2:  Collect  terms  in  0r+]  +  . . .  +  71^+1  by  order  of  n~*,  denote  P'r  =  Cin~»  + 
...  +  crn~2,  where  Cj  are  polynomial  of  /i  with  order  3i,  and  all  coefficients  of  the 
polynomial  are  bounded.  It  is  easy  to  see: 

/  expi-lrfBhm  +  i/t  +  . . .  +  -Ur  -  (1  +  P')\dh  <  Mn-*? 

J||/i||<n?<52  2  r! 

for  some  N6y(>  N5y)  and  n  >  7V6y. 

Step3:  Collect  terms  in  (1  +  P'T)^r  by  order  of  n~2. 

(1+Pr>r  =  tt(0)(1  +  P;'  +  R*), 


96 


where  P"  =  n  Hx  +  . . .  +  n  *tr,ti  are  polynomial  of  h  with  order  3i,  all  coefficients 
are  bounded.  It  is  straight  forward  to  show  that 

/       i    I  exp{-i/i*jB/i}7r(0)(l  +  P")  -  exp{-\h*Bh}{\  +  P'r)irr\dh  <  MrC*? 

for  some  N7y(>  N6y)  and  n  >  N7y. 
Now  we  prove  our  major  theorem: 

Proof  of  Theorem  5.4.1  let  y  G  5  =  54, 54  is  defined  in  Step  1  above.  From  results 
in  Stepl-3  and  Lemma  8,  we  have 


r         M6  +  n  *h)    $  +  n_x^  _  exp|_Ut^}(1  +  p^\dh  <  Mn-^ 

J\\h\\<nt62  L{9)  2 

for  n  >  N7y.  On  the  other  hand,  from  Lemma  7,  we  have 

a  1 

L    *    |  W  ±£-%(9  +  n-h)  -  exp{- W}0  +  Ol^ 

•/||/i||>n*<52  L(0)  2 

<  /  \L^  +  U  2k\n(e  +  n-h)dh+  I  expi-l^Bh^l  +  P^dh 

~    J\\h\\>nh2*        L{6)  ^||h||>n*«2  2 

<  Mn  2 


for  some  A^(>  N7y)  and  n  >  Ny.  Hence  for  any  p  +  2  dimensional  Borel  set  G, 

(5.4.2) 


for  some  iVj,  and  n  >  Ny.  In  particular, 

,  I     L{9  +  "-*h)%0  +  n-ih)  _  expf-i^Xl  +  P?)]dh\  <  MtT** 
jRP+        L^  2  (5.4.3) 

The  theorem  follows  from  (5.4.2),  (5.4.3)  and  by  formal  division  and  a  transformation. 


97 


5.5    Further  Discussions 


The  Assumptions  1-5  will  be  satisfied  by  many  continuous  log-concave  density 
functions,  including  Normal,  Gamma,  Logistic  and  Weibull,  etc.  The  log-concave 
assumption  for  density  function  /(•)  (Assumption  4)  is  there  for  some  technical  rea- 
sons, and  is  only  used  to  prove  the  strong  consistency  of  the  maximum  likelihood 
estimator.  If  this  condition  could  be  lifted,  then  our  results  would  apply  to  richer 
family  of  distributions.  For  example,  a  very  important  distribution,  t-family,  is  not 
log-concave.  This  pose  a  interesting  question  that  whether  the  maximum  likelihood 
estimator  is  strong  consistency  for  t  distribution  under  our  setting. 

We  refer  to  Skovgaard(1981)  for  validity  of  Edgeworth  assumptions.  The  validity 
of  probability  matching  priors  in  Ghosh  and  Mukerjee  (1991,  1992),  Mukerjee  and 
Dey  (1993),  Mukerjee  and  Ghosh  (1996),  etc,  may  be  justified  for  location-scale  linear 
regression  model.  Furthermore,  for  interested  parameter  (PXq  or  a,  where  X0  is  any 
known  (p+ 1)  vector,  prior  7r(-)  oc  £  will  ensure  second  order  matching  based  on  both 
posterior  quantiles  and  on  inversion  of  conditional  likelihood  ratio  statistics. 


CHAPTER  6 
SUMMARY  AND  FUTURE  RESEARCH 

6.1  Summary 

In  this  dissertation,  we  have  derived  first  order  and  second  order  probability 
matching  priors  and  reference  priors  in  multivariate  linear  calibration,  and  for  gener- 
alized Fieller-Creasy  problems,  slope-ratio  problems  and  more  generally,  for  the  prob- 
lem of  estimating  a  ratio  of  two  linear  combinations  of  coefficients  in  general  multiple 
linear  regression.  Parametric  orthogonal  transformations  are  found  to  facilitate  the 
derivations.  The  properties  of  the  posterior  distributions  are  discussed,  and  explicit 
forms  of  the  posteriors  are  derived  in  the  normal  case.  For  generalized  Fieller-Creasy 
problems,  the  Baves  procedures  are  implemented  by  using  Markov  Chain  Monte  Carlo 
(MC2).  For  slope-ratio  problems,  design  issues  are  addressed  such  that  the  posteriors 
provide  more  correct  information  of  the  true  parameter.  For  calibration  problem,  it 
turns  out  that  the  prior  of  Hunter  and  Lamboy  is  a  second  order  probability  matching 
prior.  For  general  multiple  linear  regression,  a  sufficient  condition  is  given  such  that 
an  orthogonal  parametric  transformation  exists,  and  further,  a  second  order  probabil- 
ity matching  prior  exists  which  does  not  depend  on  the  choice  of  the  error  distribution 
and  the  design  matrix.  Simulation  studies  and  numerical  examples  all  suggest  that 
the  second  order  probability  matching  priors  perform  very  well  in  term  of  matching 
the  target  coverage  probabilities  in  a  frequentist  sense,  and  this  performance  is  robust 
across  different  error  distributions,  either  with  heavy  tails  or  with  light  tails. 


98 


99 


Finally,  the  asymptotic  expansions  of  the  posterior  probabilities  in  general  linear 
regression  model  are  derived.  Strong  consistency  of  the  maximum  likelihood  estima- 
tors of  the  parameters  are  proved. 

6.2    Future  Research 

Bioequivalence  problem  is  another  ratio-type  problem.  The  studies  of  this  topic  so 
far  are  restricted  on  normality  assumption,  and  most  of  them  are  based  on  frequentist 
approaches.  It  is  interesting  to  study  this  problem  in  a  Bayesian  way  using  the  second 
order  probability  matching  priors,  and  to  compare  the  Bayesian  results  with  those 
obtained  by  the  Intersection-Union  tests  and  Equivalence  confidence  sets  as  given  by 
Berger  and  J.Hsu  (1996). 

Also,  It  is  interesting  to  investigate  in  some  details  about  the  relationship  be- 
tween orthogonal  parameterization  (Cox  and  Reid,  1987)  and  the  optimum  designs 
in  general  multiple  regression. 


REFERENCES 


Barndorff-Nielsen,  O.E.  (1983).  On  a  formula  for  the  distribution  of  the  maximum  likelihood 
estimator.  Biometrika  70,  343-65. 

Berger,  J.O.  and  Bernardo,  J.M.  (1989).  Estimating  a  product  of  means:  Bayesian  analysis 
with  reference  priors.  J  ASA,  84,  200-7. 

Berger,  J.O.  and  Bernardo,  J.M.  (1992a).  On  the  development  of  reference  priors  (with 
discussion).  In  J.M.Bernardo,  J.O. Berger,  A.P.Dawid  k  A.F.M. Smith,  eds,Bayesian 
Statistics  IV,  Oxford  University  Press,  35-60. 

Berger,  J.O.  and  Bernardo,  J.M.  (1992b).  Ordered  group  reference  priors  with  application 
to  the  multinomial  problem.  Biometrika,  79,  25-37. 

Berger,  J.O.,  Liseo,  B.  and  Wolpert,  R.  (1996).  Integrated  likelihood  methods  for  eliminating 
nuisance  parameters.  Technical  Report  #96-7C.  Dept.  of  Stat.,  Purdue  University. 

Berger,  R.L.  and  Hsu,  J.C.  (1996).  Bioequivalence  trials,  intersection-union  tests  and  equiv- 
alence confidence  sets.  Statistician,  11,  No.  4,  283-301. 

Bernardo,  J.M.  (1977).  Inferences  about  the  ratio  of  normal  means:  A  Bayesian  approach 
tu  the  Fiellei  Creasy  problem.  In  Recent  Developments  in  Statistics,  J.R.  Barra  et  al. 
(eds),  345-350.  Amsterdam:North-Holland. 

Bernardo,  J.M.  (1979).  Reference  posterior  distributions  for  Bayesian  inference.  J.  Roy. 
Statist.  Soc.  B  41  113-147  ,  (with  discussion). 

Bhattacharya,  R.N.  and  Ghosh,  J.K.  (1978).  On  the  validity  of  the  formal  edgeworth  ex- 
pansion. Ann.  Statist.  6,  434-451. 

Brown,  P.J. (1982).  Multivariate  calibration  (with  discussion).  Journal  of  the  Royal  Statis- 
tical Society,  B  44  287-321 

Brown,  P.J.  (1993). Measurement,  Regression,  and  Calibration.  Oxford:  Clarendon  Press. 

Brown,  P.J.  and  Oman,  S.D.  (1991).  Double  points  in  nonlinear  calibration.  Biometrika, 
78,  33-43. 

Brown,  P.J.  and  Sundberg,  R.  (1987).  Confidence  and  conflict  in  multivariate  calibration. 
Journal  of  the  Royal  Statistical  Society  B  49,  46-57. 

Chao,  M.T.  (197-).  Strong  consistency  of  maximum  likelihood  estimators  when  the  ob- 
servations are  independent  but  not  identically  distributed.  Dr.  Y.W.Chen's  60-year 
Memorial  Volume.  Academic  Sinica,  Taipei. 


100 


101 


Chib,  S.  and  Greenberg,  E.  (1995).  Understanding  the  Metropolis-Hastings  algorithm.  The 
American  Statistician.  Vol.  49,  No.  4,  327-35. 

Cox,  D.R.  and  Reid,  N.  (1987).  Parameter  orthogonality  and  approximate  conditional  in- 
ference. J.  R.  Statist.  Soc.  B  49,  1-18. 

Creasy,M.A.  (1954).  Limits  for  the  ratio  of  the  means.  J.R.  Statist.  Soc.  B  16,  186-94. 

Darby,  S.C.  (1980).  A  Bayesian  approach  to  parallel  line  bioassay.  Biometrika  67,  3,  607-12. 

Datta,  G.S.  and  Ghosh,  J.K.  (1995a).  On  priors  providing  frequentist  validity  for  Bayesian 
inference.  Biometrika  82,  37-45 

Datta,  G.  S.  and  Ghosh,M.  (1995b).  Some  remarks  on  noninfrvmative  priors.  J.  Amer. 
Statist.  Assoc.  90,  1357-1363. 

Datta,  G.  S.  and  Ghor.h,M.  (1996).  On  the  invariance  of  non-informative  priors.  Ann.  Statist. 
24,  141-59. 

Dawid,  A. P.,  Stone,  M.  and  Zidek,  J.V..  Marginalization  paradoxes  in  Bayesian  and  Struc- 
tural inference  (with  discussion).  J.Roy.  Statist,  soc.  B  35  169-233. 

Durbin,  J.  (1980).  Approximations  for  densities  of  sufficient  estimators.  Biometrika  67  311- 
333. 

Eisenhart,  C.  (1939).  The  interpretation  of  certain  regression  methods  and  their  use  in 
biological  and  statistical  research.  Ann.  Math.  Statist.  10  162-184. 

Fieller,  E.C.  (1954).  Some  problems  in  interval  estimation.  J.R.  Statist.  Soc,  B  16,  175-85. 

Finney,  D.J.  (1978)  Statistical  Methods  in  Biological  Assay.  London:  Griffin. 

Fisher,  R.A.  (1934)  Two  new  properties  of  mathematical  likelihood,  Proc.R.  Soc.  A  144, 
285-307 

Garvan  C.  W.  and  Ghosh  M.  (1997)  Noninformative  priors  for  dispersion  models.  In  press. 

Ghosh,  J.K.  and  Mukerjee,  R.  (1991).  Characterization  of  priors  under  which  Bayesian  and 
frequentist  Bartlett  corrections  are  equivalent  in  the  multiparameter  case.  J.  of  Mult. 
Anal.  38,385-93. 

Ghosh,  J.  K.  and  Mukerjee,R.  (1992).  Bayesian  and  frequentist  Bartlett  corrections  for 
likelihood  ratio  and  conditional  likelihood  ratio  tests.  J.R.  Statist.  soc.B  54  867-875. 

Ghosh,  J.K.  Sinha,  B.K.  and  Joshi,  S.M.  (1982).  Expansions  for  posterior  probability  and 
integrated  Bayes  risk.  In  Statistical  Decision  and  Related  Topics  III  (S.S  Gapta  and 
J.O.  Berger,  eds)  1  40-456.  Academic  New  York. 

Ghosh,  M.,  Carlin  B.P.  and  Srivastava  M.S.  (1995).  Probability  matching  priors  for  linear 
calibration.  Test,  wol.4,  No.2,  333-357. 

Ghosh,  M.  and  Mukerjee,  R  (1996).  Recent  developments  on  probability  matching  priors. 
Biometrika.  In  press. 


102 


Ghosh,  M.  and  Yang,  M.C.  (1996).  Noninformative  priors  for  the  two  sample  normal  prob- 
lem. Test  5,  145-157. 

Gleser,  L.J.  and  Hwang,J.T.  (1987).  The  non-existence  of  100(1  -  a)%  confidence  sets  of 
finite  expected  diameters  in  error-in-variable  and  related  models.  Ann.  Statist.  15, 
1351-1362. 

Hill,  B.M.  (1981).  Discussion  of  "A  Bayesian  analysis  of  the  linear  calibration  problem." 
Technomtincs  23,  335-338. 

Hoadley,  B.  (1970).  'V  Baysian  look  at  inverse  linear  regression.  J.  American  Statist.  Assoc. 
65,  356-369. 

Hoadley,  B.  (1971).  Asymptotic  properties  of  maximum  likelihood  estimators  for  the  inde- 
pendent not  identically  distributed  case.  Ann.  Math.  Statist.  42  1977-1991. 

Hunter,  W.G.  and  Lamboy,  W.F.  (1981).  A  Bayesian  analysis  of  the  linear  calibration 
problem.  Technometrics,  23,  323-350 

Jeffreys,  H.  (1961).  Theory  of  Probability,  3rd  edition,  Oxford: Clarendon  Press. 

Johnson,  R.A.  (1970).  Asymptotic  expansions  associated  with  posterior  distribution.  Ann. 
Math.  Statist.  41  851-864. 

Kappenman,  R.F.,  Geisser,  S  and  Antle,  C.F.  (1970).  Bayesian  and  fiducial  solutions  to  the 
Fieller-Creasv  problem.  Sankhya,  B,  32,  331-40. 

Krutchkoff,  R.G.  (1967).  Classical  and  inverse  regression  methods  of  calibration.  Techno- 
metrics 9.  429-439. 

Kuboknwa,  T.  and  Robert,  C.  (1994).  New  perspectives  on  linear  calibration.  J.  Multivariate 
Analysis  50,  178-200. 

Laplace,  P.  (1812).  Theorie  Analytique  des  Probabtlities.  Courcier,  Paris. 

Lawless,  J.F.  (1981).  Discussion  of  "A  Bayesian  analysis  of  the  linear  calibration  problem." 
Technometrics  23,  334-335. 

LeCam,  L.  (1953).  On  some  asymptotic  properties  of  maximum  likelihood  estimates  and 
related  Bayes'  estimates.  Univ.  of  California  Publications  in  Statistics  1,  277-330. 

Lee,C.B.  (1989).  Comparisons  of  frequentist  coverage  probability  and  Bayesian  posterior 
coverage  probability  and  applications.  Ph.D.  Thesis,  Purdue  Univ. 

Lindley,  D.V.  (1956).  On  a  measure  of  the  information  provided  by  an  experiment. Ann. 
Math.  Statist,  27,  986-1005 

Lindley,  D.V.  (1958).  Fiducial  distributions  and  Bayes'  theorem.  J.Roy  STatist.  Soc.  B  20 
102-107 

Liseo,  B.  (1993).  Elimination  of  nuisance  parameters  with  reference  priors.  Biometrika  80, 
295-304. 


103 


Mathew,  T.  and  Kasala,  S.  (1994).  An  exact  confidence  region  in  multivariate  calibration. 
Ann.  Statist,  22,  No.l,  94-105. 

Mathew,  T.  and  Zha,W.  X.  (1996).  Conservative  confidence  regions  in  multivariate  calibra- 
tion. Ann.  Statist,  24,  No.2,  707-25. 

McCullagh,  P.  and  Tibshirani,  R.  (1990).  A  simple  method  for  the  adjustment  of  profile 
likelihoods.  J.  Roy.  Statist.  Soc.  B  52,  325-344. 

Mendoza,  M.(1988).  Inferences  about  the  ratio  of  linear  combinations  of  the  coefficients  in  a 
multiple  regression  model.  Bayesian  Statistics  3  (J.M.  Bernardo,  M.H.  DeGroot,  D.V. 
Lindley  and  A.F.M.  Smith,  eds.).  Oxford:  University  Press,  705-711. 

Mendoza,  M.(1990).  A  Bayesian  analysis  of  the  slope  ratio  bioassay.  Biometrics  46,  1059- 
1069. 

Mendoza,  M.  (1996).  A  note  on  the  confidence  probabilities  of  reference  priors  for  the 
calibration  model.  Preprint. 

Mukerjee,  R.  &:  Dey,  D.K.  (1985).  Frequentist  validity  of  posterior  quantiles  in  the  presence 
of  a  nuisance  parameter:  Higher  order  asymptotics.  Biometrika  80,  499-505. 

Mukerjee,R.  and  Ghosh,  M.  (1996).  Second  order  probability  matching  priors.  Biometrika. 
To  appear. 

Neyman,  J.  and  Scott,  E.  (1948).  Consistent  estimates  based  on  partially  consistent  obser- 
vations. Econometrica,  16,  1-32. 

Oman,  S.  D.  (1988).  Confidence  regions  in  multivariate  calibration.  Ann.  Statist,  16,  No.l, 
174-187. 

Peers,  H.W.  (1965).  On  confidence  points  and  Bayesian  probability  points  in  the  case  of 
several  parameters.  J.  R.  Stat.  Soc.  B  27,  16-27 

Philippe,  A.  and  Robert,  C.  (1994).  A  note  on  the  confidence  properties  of  reference  priors 
for  the  calibration  model.  Tech.  Rep.,  Universite  de  Rouen. 

Plessis,  J.L.du,  Merwe,  A.J.  van  der  and  Groenewald,  P.C.N.  (1995).  Reference  priors  for 
the  multivariate  calibration  problem.  South  African  Statist.  J.  29,  155-168. 

Press,  S.J.  (1969).  TW  i-Ratio  distribution.  JASA,  vol.  64,  pp.  942-252. 

Press.  S.J.  (1982).  Applied  multivariate  analysis.  New  York:  Holit,  Rinehart  and  Winston. 

Reid,  N.  (1995).  Likelihood  and  Bayesian  approximation  methods.  Bayesian  Statistics  5 
(J.M.  Bernardo  et  al  eds.),  pp.  351-368.  Oxford  Univ.  Press. 

Reiss,  R.  (1973).  On  the  measurability  and  consistency  of  maximun  likelihood  estimates  for 
unimodel  densities.  Ann.Statist.  1  888-901. 

Sendra,  M.  (1982).  Distribucion  final  de  referenda  para  el  problema  de  Fieller-Creasy.  Tra- 
bajos  de  Estadistica  Investigacion  Operativa  33,  55-72. 


104 


Skovgaard,  IB  M.(1981).  Edgeworth  expansions  of  the  distribution  of  maximum  likelihood 
estimators  in  the  general  (non  i.i.d)  case.  Scand  J.  Statist.  8  207-217 

Stein,  C.  (1985).  'Jn  the  coverage  probability  of  confidence  sets  based  on  a  prior  distribu- 
tion, in  Sequential  Methods  in  Statistics,  Banach  Center  publication,  16,  PWN-Polish 
Scientific  Publishers,  Warsaw. 

Stephens,  D.A.  and  Smith,  A.  F.  M.  (1992).  Sampling-resampling  techniques  for  the  com- 
putation of  posterior  densities  in  normal  means  problems.  Test,  1,  1-18. 

Sun,  D.  and  Ye,  K.  (1996).  Frequentist  validity  of  posterior  quantiles  for  a  two  parameter 
exponential  family.  Biometrika  83,  55-65. 

Tibshirani,R.J.  (1989).  Non-informative  priors  for  one  parameter  of  many.  Biometrika  76, 
604-608. 

Wald,  A.  (1949).  Note  on  the  consistency  of  the  maximum  likelihood  estimate.  Ann.  Math. 
Statist.  20  595-601 

Welch, B.L.  &;  Peers,H.W.  (1963).  On  formulae  for  confidence  points  based  on  integrals  of 
weighted  likelihoods.  J.  R.  Statist.  Soc.  B  25,  318-29. 

Williams,  E.J.  (1969).  A  note  on  regression  methods  in  calibration.  Technomerics  11,  189- 
192. 


BIOGRAPHICAL  SKETCH 


The  author  was  born  in  XiangTan,  Hunan,  China  in  1965.  He  received  the  B.S.  de- 
gree in  Mathematics  from  XiangTan  University  in  1986,  and  M.S.  degree  in  Statistics 
from  Peking  University  in  1989.  In  1994,  he  transfered  from  University  of  Montana 
to  University  of  Florida,  to  pursue  his  Ph.D.  degree  in  Department  of  Statistics. 


105 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 

Malay  Ghosl^lChaffman 
Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Myron<!&hang  ~~ 


Myron  uEhang 
Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Pejaver  V.  Rao 
Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Geoff  Vinjrfl 


Geoff 

Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Michael  Delorenzo 
Professor  of  Dairy  and'Poultry  Sciences 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Department  of 
Statistics  in  the  College  of  Liberal  Arts  and  Sciences  and  to  the  Graduate  School  and 
was  accepted  as  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of 
Philosophy. 


December  1997 


Dean,  Graduate  School 


